I would suspect that tcpflow
would do your job well enough, which can take a pcap file and divvy it up into it's component parts. For instance, I just did the following as a test:
sudo tcpdump -i eth0 -n -s 0 -w /tmp/capt -v port 80
Then reloaded your question, stopped tcpdump
, and then ran:
tcpflow -r /tmp/capt
And got about 20 files, each containing a single HTTP request or response (as appropriate).
On the other hand, I usually just go the soft option and open up my capture files in wireshark, whose "Analyze -> Follow TCP Stream" mode is freaking awesome (colour coded and everything).
Both of these tools, by the way, can do the packet capture themselves, too -- you don't have to feed them an existing packet capture via tcpdump
.
If you have a specific need to parse the HTTP traffic after you've split it up, it's quite trivial: the HTTP protocol is very simple. In the trivial (non-keepalive/pipelined) case, you can use the following to get the request or response header:
sed '/^\r$/q' <connectionfile>
And this to get the body of the request/response:
sed -n '/^\r$/,$p' <connectionfile>
(You can also pipe things through those sed commands if you like).
On keepalive connections, you then need to start getting a little scripty, but even then it's about 20 lines of script to process the two files (A to B, B to A), extract the headers, read the Content-Length, then read the body -- and if you're doing any sort of automated processing, you'll be writing code to do that stuff anyway, so a bit of HTTP dissection doesn't add considerably to the workload.