HTTP dissector that reads from pcap

Question

I have some pcap data from a local interface which I'd like to analyze. Specifically, I'd like the content of HTTP sessions. I'm aware of many HTTP header statistics tools, but I would specifically like to reassemble the content of each complete HTTP connection.

Is there any suitable layer-4 packet dumping tool (for Linux) in the same way that tcpdump et al work for layer 3, something that can understand and manipulate HTTP?

Feel free to redirect me if this has been asked before, though I haven't been able to find any answer to this yet on SF. Thanks!

womble · Answer 1 · 2011-07-18T08:58:00.783

3

I would suspect that tcpflow would do your job well enough, which can take a pcap file and divvy it up into it's component parts. For instance, I just did the following as a test:

sudo tcpdump -i eth0 -n -s 0 -w /tmp/capt -v port 80

Then reloaded your question, stopped tcpdump, and then ran:

tcpflow -r /tmp/capt

And got about 20 files, each containing a single HTTP request or response (as appropriate).

On the other hand, I usually just go the soft option and open up my capture files in wireshark, whose "Analyze -> Follow TCP Stream" mode is freaking awesome (colour coded and everything).

Both of these tools, by the way, can do the packet capture themselves, too -- you don't have to feed them an existing packet capture via tcpdump.

If you have a specific need to parse the HTTP traffic after you've split it up, it's quite trivial: the HTTP protocol is very simple. In the trivial (non-keepalive/pipelined) case, you can use the following to get the request or response header:

sed '/^\r$/q' <connectionfile>

And this to get the body of the request/response:

sed -n '/^\r$/,$p' <connectionfile>

(You can also pipe things through those sed commands if you like).

On keepalive connections, you then need to start getting a little scripty, but even then it's about 20 lines of script to process the two files (A to B, B to A), extract the headers, read the Content-Length, then read the body -- and if you're doing any sort of automated processing, you'll be writing code to do that stuff anyway, so a bit of HTTP dissection doesn't add considerably to the workload.

edited Jul 18 '11 at 08:58

answered Jul 16 '11 at 23:54

womble

96,255
29
175
230

I've already used both those methods, but they still only give me the TCP connection, not the HTTP payload. It's serviceable, but I'd prefer an actual HTTP dissector. – Kerrek SB Jul 17 '11 at 08:10
tcpflow + sed => HTTP dissector. – womble Jul 17 '11 at 08:18
Can you add that to your post? Concretely, if possible? – Kerrek SB Jul 17 '11 at 08:38
Can you upvote useful answers? – womble Jul 17 '11 at 09:34
Sure, but as I said, I'm very familiar with tcpdump and other TCP tools. What I really seek is an HTTP processing tool. – Kerrek SB Jul 17 '11 at 11:52
So that's a "no", then? – womble Jul 17 '11 at 12:07
Fine, have an upvote :-) But I'm still looking for an actual answer! :-) – Kerrek SB Jul 17 '11 at 16:41

HTTP dissector that reads from pcap

1 Answers1