2

Is there a command-line HTTP proxy that outputs to STDOUT so I can use it with Unix pipes?

I want to do something like this:

  1. Start the proxy at the command-line:
    $ proxy -p 8888 | grep "Text I'm interested in" > ~/my_log.txt
  2. Configure my browser to use the HTTP proxy on port 8888.
  3. Browse the Internet. As I browse, HTML is grepped and saved to my_log.txt
  4. CTRL-C when I'm done.

UPDATE: I hadn't thought about this before, but the solution needs to handle gzipped/deflated content correctly.

chicks
  • 3,793
  • 10
  • 27
  • 36
richardkmiller
  • 255
  • 2
  • 12

4 Answers4

1

Can you skip the proxy, and just use tcpdump with the -A option and a filter?

# capture everything destined for port 80
tcpdump -qni eth0 -s 0 -A port 80

# capture everything destined for port 80 on 192.168.32.1
tcpdump -qni eth0 -s 0 -A port 80 and host 192.168.32.1

# capture everything destined for port 80 and display only the interesting bit.
tcpdump -qni eth0 -s 0 -A port 80 | grep "Text I'm interested in"
Zoredache
  • 130,897
  • 41
  • 276
  • 420
  • You could probably run Squid in conjunction with this (or I don't see why you couldn't use netcat as an http proxy). – gravyface Mar 03 '11 at 00:42
  • I was thinking you could use tshark instead of tcpdump to eliminate the need for grep, but there doesn't seem to be a [display filter][1] for the body of an http response. [1]: http://www.wireshark.org/docs/dfref/h/http.html – sciurus Mar 03 '11 at 05:28
  • I've used tcpdump before, and think it's a great tool, but I was hoping for something HTTP specific. For example, an HTTP-specific tool could uncompress gzip/deflate'd content before sending it to STDOUT. The packet capture tools like tcpdump and ngrep are showing me compressed content. – richardkmiller Mar 04 '11 at 17:47
  • @richardkmiller, I understand your desire to get the uncompressed content, but I don't think an HTTP proxy would do what you want either. I don't believe they typically will do anything about uncompressing the HTML body. A google search for [http capture gzip decompression](http://www.google.com/search?q=http+capture+gzip+decompression) revealed this [page](http://bramp.net/blog/follow-http-stream-with-decompression) which looks close to your requirements. – Zoredache Mar 04 '11 at 18:36
  • You may be right. In the past, I've used Apache as a proxy and used ExtFilterDefine to define a script through which to pass the content, but I had to turn off accept-encoding in my browser, as mentioned by sciurus below. I was hoping there was an easier way, especially from the command-line, but perhaps there aren't any easy alternatives. – richardkmiller Mar 23 '11 at 15:47
1

You can do this with ngrep.

ngrep -q -W byline "Text I'm interested in" port 80 > ~/my_log.txt

sciurus
  • 12,678
  • 2
  • 31
  • 49
  • Thanks for introducing me to ngrep. I played around with it a bit and found it easier to use than tcpdump. While using it, it occurred to me that neither ngrep nor tcpdump handles gzipped or deflated content, which is how my browser (Chrome) is requesting it. Do you know if it's possible to ungzip/inflate the HTTP bodies? – richardkmiller Mar 04 '11 at 17:44
  • It may be possible for you to have your browser stop advertising that it accepts compressed content. In Firefox, you could clear the value of network.http.accept-encoding. See http://kb.mozillazine.org/Network.http.accept-encoding – sciurus Mar 04 '11 at 19:13
  • Good point. I've used that in the past, as I just mentioned to Zoredache above, and it sounds like I'll need to continue to do that. – richardkmiller Mar 23 '11 at 15:48
0

using polipo

polipo logLevel=0xFF

execute polipo -v | grep logLevel can see:

logLevel integer 0x7 Logging level (max = 0xFF).

ife
  • 101
  • 1
-2

It doesn't do exactly what you want out-of-the-box, but I'd say a modified version of SimpleHTTPServer would do the trick.

python -m SimpleHTTPServer <port>

It currently lets you run a HTTP server out of your PWD, and returns the access log to STOUT.

Coops
  • 6,055
  • 1
  • 34
  • 54
  • Thanks for this pointer. I searched a bit for a command-line option for invoking a proxy server in SimpleHTTPServer but didn't find anything. It may be a matter of my not knowing enough Python. – richardkmiller Mar 04 '11 at 17:53