The WebGrep tool aims to grep a Web page and, if required, its related resources but with some additional features for preprocessing and deriving new resources.
For this purpose, WebGrep :
- Relies on the common
- Mimics every option of this tool except
-r(recursive) as, by design, WebGrep is not aimed to crawl Web pages.
- Gets page-related resources like images, scripts and style sheets.
- Holds extra features for applying transformations on these resources in order to get more relevant results.
$ webgrep Welcome https://github.com Welcome home, <br>developers
- Platform: Linux
- Python: 2 or 3
During multiple scenario's in my professional life, I required to search for keywords in the sources of various Web pages but also in the related resources like scripts and images. After parsing some projects on GitHub, I realized there was no consistent tool for handling Grep-like functionality for the Web pages.
In the remainder of this documentation, the following terms are used:
Resource: The main entity ; can be a Web page, an image, a script, a style sheet or anything else.
Tools: These are alternative tools that can be used to derive new resources based on the resource type ; can be
tesseract-ocrfor an image, ...