Grepping with WebGrep can be performed using the following methodology:
Only grep the Web page:
$ webgrep keyword http://example.com
Then grep the Web page and its same-origin resources:
$ webgrep -r keyword http://example.com
Now grep the Web page with all its related resources:
$ webgrep -R keyword http://example.com
If relevant, also grep the HTTP headers (e.g. for inspecting a cookie):
$ webgrep cookie http://example.com --include-headers
WebGrep will process the input resource by resource, meaning that it will first grep on the Web page, then download and grep resource by resource, so that it behaves just like the normal
grep tool. For this purpose, files will be downloaded and saved to a temporary folder (
/tmp/webgrep by default, can be tuned ; see the help of the tool).
In term of a class diagram, it gives a
Resource class that recursively composes other instances, like depicted in the following figure:
For a deeper understanding on how WebGrep works with this class, please see the next section.