8

I'm using rdpcap function of Scapy to read a PCAP file. I also use the module described in a link to HTTP support in Scapy which is needed in my case, as I have to retrieve all the HTTP requests and responses and their related packets.

I noticed that parsing a large PCAP file the rdpcap function takes too much time to read it.

Is there a solution to read a pcap file faster?

Piotr Kula
  • 9,597
  • 8
  • 59
  • 85
auino
  • 1,644
  • 5
  • 23
  • 43
  • How big is your pcap file? How long does it take to read it? Is it really too long (even for loading it only once)? How many times do you want to read it (rhetorical question)? – Dr. Jan-Philip Gehrcke May 29 '12 at 13:51
  • My file is greater than 300 MB, I have to launch the Python script more than once. – auino May 29 '12 at 13:56
  • @auino, what specifically is the problem with the read time? Is it that it takes too long to develop your script when you're parsing a 300MB file every time you make a change, or is there some real-time processing requirement? Also, please give us a sense for what is an acceptable parse time – Mike Pennington May 29 '12 at 14:07
  • It takes about 1 hours... it's not good, as I have to parse the data just loaded... – auino May 29 '12 at 14:28
  • Please use upvote to thank and do not thank in question. – Piotr Kula Sep 25 '13 at 12:06

4 Answers4

11

Scapy has another method sniff which you can use to read the pcap files too:

def method_filter_HTTP(pkt):
    #Your processing
      
sniff(offline="your_file.pcap", prn=method_filter_HTTP, store=0)

rdpcap loads the entire pcap file to the memory. Hence it uses a lot of memory and as you said its slow. While sniff reads one packet at a time and passes it to the provided prn function. That store=0 parameter ensures that the packet is deleted from memory as soon as it is processed.

Neuron
  • 5,141
  • 5
  • 38
  • 59
wonder
  • 885
  • 1
  • 18
  • 32
4

While I agree the load time is longer than one might expect, it is likely because the file is being parsed to generate an array of highly composed objects. What I've had to do was use editcap to chop up the packet captures to make reading them a bit easier. For example:

$ editcap -B 2013-05-2810:05:55 -i 5 -F libpcap inputcapture.pcap outputcapture.pcap

Please note: a full explanation of the switches of this command is available here.

Also, the -F libpcap part seemed to be necessary (at least for me) to get scapy's pcap function able to parse the file. (This is supposed to be the default pcap file output format, but this was not the case for me, for whatever reason. You can verify the file type of your input and output files with capinfos (e.g., simply enter capinfos your_capture.pcap).

Both capinfos and editcap are available with the WireShark distribution.

vincent
  • 1,305
  • 2
  • 12
  • 16
1

If you are looking for a more responsive code, consider using PcapReader() instead of rdpcap().

PcapReader() creates a generator and loads a packet only when it is needed, as opposed to rdpcap() which loads the entire trace into memory. PcapReader() is, therefore, well-suited for a large trace that takes forever to load with rdpcap(), or throws a MemoryError because it's simply too large for your system.

Example code:

packets = PcapReader('filename.pcap')
for packet in packets:
    mac_src = packet[Ether].src
    mac_dst = packet[Ether].dst
    ...

Please refer to the PcapReader() documentation for more information.

If you are only concerned about how long it takes to get the final output, then rdpcap() might have an advantage over PcapReader(), although I'm not sure about the magnitude of difference.

Sata
  • 11
  • 3
1

Since Scapy 2.4.3 it has built-in support to parse HTTP sessions. It can be used with the sniff() sessions functionality. e.g.

pkts = sniff(offline="http_chunk.pcap.gz", session=TCPSession, store=0)

When using the TCPsession functionality with an HTTP/1 capture it returns a list of 'packets' which contain the assembled data from all underlying packets that make up each HTTPRequest, HTTPResponse. It will still also return individual packets such as Ack packets. So, for example, checking if a 'packet' haslayer(HTTPResponse) then that 'packet' contains the entire response payload. It's also possible to use the answers() functionality to match requests and responses. Note you can use sniff() for a live capture, or with offline packet capture, or a list of packets.

Pierz
  • 7,064
  • 52
  • 59