0

I'm looking to use dpkt or pyshark coupled with cython, to speed up parsing of a lot of data (GBs) in a pcap file.

I wondered if anyone has run dpkt with cython OR pyshark with cython and could share the speed increases with me? Im specifically looking to increase speed of a python script, just not sure if dpkt or pyshark is better coupled with cython.

Thank you!

Jshee
  • 2,620
  • 6
  • 44
  • 60
  • What packets comprise the majority of the pcap? (protocol and size) – Kiran Bandla Oct 06 '16 at 02:15
  • Just calling a Python library from Cython _does not_ give much speed improvement. Only the bits you write yourself are compiled snd everything else runs at exactly the same speed. – DavidW Oct 06 '16 at 06:49
  • @KiranBandla - the packets are about 500b – Jshee Oct 06 '16 at 15:14
  • @DavidW - can you give an example please. – Jshee Oct 06 '16 at 15:14
  • @Jshee - No. I know nothing about either dpkt or pyshark. All I'm saying is that I don't think Cython will help you unless you're prepared to rewrite large chunks of the libraries yourself. – DavidW Oct 06 '16 at 16:20

1 Answers1

0

I hope this helps you. I found some differences between pyshark and dpkt. I tried to read a pcap file (size about 54MB) into main memory. Here's what happens

dpkt Module

import dpkt
import time

filename="/opt/veeru_cap.pcap"
f = open(filename)
pcap = dpkt.pcap.Reader(f)

#print pcap[0] #<---Geting TypeError: 'Reader' object does not support indexing

print "Object-->",pcap
start=time.time()
print "The start time->",start
x=list(pcap) # Reading into Main Memory!
print "The end time->",time.time()
print "Total->",time.time()-start
print "Total Length/Total Number of Packet",len(x)
print "**********************PACKET**********************"
print x[0]

OUTPUT>

Object--> <dpkt.pcap.Reader object at 0x7f2ed1535210>
The start time-> 1497818746.66
The end time-> 1497818747.06
Total-> 0.407222986221
Total Length/Total Number of Packet 65150
**********************PACKET**********************
(1497807187.704669, '\x44\x49\x44\xfdg\xa2,\xd0ZG \x4x\x48\x00E\x00\x004E\xcf@\x00@\xx6<\xxf\xxx\xgg\x33i4$\xc2\xf0\x80\x46\x0x\x4b\\\xfd\xea\xe0\xe4\xc2\xb4\xxx\x80\x10\x01l^\xf0\x00\x00\x01\x01\x0x\n\x00\x05\x15@\x054\xexx')
x84\x80\xx0\x01l^\xf0\xxx\x00\x01\xx1\xx8\n\x00\x05\xxx@\x054\xe0J')
  • dpkt dumping hexadecimal formate of packet without rendering.
  • Took very less time to read all packets into main memory(Counting packets in the file is easy!)
  • As you can see I try to print print pcap[0] directly. It is an object and not able to display packet [NOTE THIS POINT]

pyshark Module

** Continuing the answer after reboot **

import pyshark
import time

filename="/opt/veeru_cap.pcap"  
cap=pyshark.FileCapture(filename)

print "**********************PACKET**********************"
print cap[0] #<----Still able to print without converting into "List" or something

print "Object--->",type(cap)
start=time.time()
print "The start time->",start
x=list(cap) # Reading into Main Memory!
print "The end time->",time.time()
print "Total->",time.time()-start

I ran above script but my computer became unresponsive, had to reboot.

  • Reading the whole packets into main memory is taking time
  • Displaying packet formate is really good, just like in wireshark(check here)
  • Here I'm able to print print pcap[0] without converting into list. So we can do iteration in pcap object directly. But I try to print len(pcap), it showing 0. If I do len(pcap) after printing print pcap[0] it is showing the length as 1

Tested on

CPython Compiler, Linux
Quad-Core Processor Intel i3 

I have not checked the documentations fully, may there are some methods to optimize.

Community
  • 1
  • 1
Veerendra K
  • 2,145
  • 7
  • 32
  • 61
  • 1
    A couple of observations: 1. dpkt's pcap Reader is an iterator. Trying to index a packet like you tried is bound to fail. 2. doing `list(pcap)` defeats the efficiency. If you really have to do this, you can do `pcap.readpkts()` – Kiran Bandla Mar 01 '18 at 01:53