I have to read and parse .pcap files that are too large to load into memory. I am currently using sniff in offline mode
sniff(offline=file_in, prn=customAction, store=0)
with a customAction function that looks roughly like this:
customAction(packet):
global COUNT
COUNT = COUNT + 1
# do some other stuff that takes practically 0 time
Currently this processes packets too slowly. I am already using subprocess in a 'driver' program to run this script on multiple files simultaneously on different cores but I really need to improve single core performance.
I tried using pypy and was disappointed that performance using pypy less than 10% better than using python3 (anaconda).
Average time to run 50k packets using pypy is 52.54 seconds
Average time to run 50k packets using python3 is 56.93 seconds
Is there any way to speed things up?
EDIT: Below is the result of cProfile, as you can see the code is a bit slower while being profiled but all of the time is spent doing things is scapy.
66054791 function calls (61851423 primitive calls) in 85.482 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
957/1 0.017 0.000 85.483 85.483 {built-in method builtins.exec}
1 0.001 0.001 85.483 85.483 parser-3.py:1(<module>)
1 0.336 0.336 83.039 83.039 sendrecv.py:542(sniff)
50001 0.075 0.000 81.693 0.002 utils.py:817(recv)
50001 0.379 0.000 81.618 0.002 utils.py:794(read_packet)
795097/50003 3.937 0.000 80.140 0.002 base_classes.py:195(__call__)
397549/50003 6.467 0.000 79.543 0.002 packet.py:70(__init__)
397545/50000 1.475 0.000 76.451 0.002 packet.py:616(dissect)
397397/50000 0.817 0.000 74.002 0.001 packet.py:598(do_dissect_payload)
397545/200039 6.908 0.000 49.511 0.000 packet.py:580(do_dissect)
199083 0.806 0.000 32.319 0.000 dns.py:144(getfield)
104043 1.023 0.000 22.996 0.000 dns.py:127(decodeRR)
397548 0.343 0.000 15.059 0.000 packet.py:99(init_fields)
397549 6.043 0.000 14.716 0.000 packet.py:102(do_init_fields)
6673299/6311213 6.832 0.000 13.259 0.000 packet.py:215(__setattr__)
3099782/3095902 5.785 0.000 8.197 0.000 copy.py:137(deepcopy)
3746538/2335718 4.181 0.000 6.980 0.000 packet.py:199(setfieldval)
149866 1.885 0.000 6.678 0.000 packet.py:629(guess_payload_class)
738212 5.730 0.000 6.311 0.000 fields.py:675(getfield)
1756450 3.393 0.000 5.521 0.000 fields.py:78(getfield)
49775 0.200 0.000 5.401 0.000 dns.py:170(decodeRR)
1632614 2.275 0.000 4.591 0.000 packet.py:191(__getattr__)
985050/985037 1.720 0.000 4.229 0.000 {built-in method builtins.hasattr}
326681/194989 0.965 0.000 2.876 0.000 packet.py:122(add_payload)
...
EDIT 2: Full code example:
from scapy.all import *
from scapy.utils import PcapReader
import time, sys, logging
COUNT = 0
def customAction(packet):
global COUNT
COUNT = COUNT + 1
file_temp = sys.argv[1]
path = '/'.join(file_temp.split('/')[:-2])
file_in = '/'.join(file_temp.split('/')[-2:])
name = file_temp.split('/')[-1:][0].split('.')[0]
os.chdir(path)
q_output_file = 'processed/q_' + name + '.csv'
a_output_file = 'processed/a_' + name + '.csv'
log_file = 'log/' + name + '.log'
logging.basicConfig(filename=log_file, level=logging.DEBUG)
t0=time.time()
sniff(offline=file_in, prn=customAction, lfilter=lambda x:x.haslayer(DNS), store=0)
t1=time.time()
logging.info("File '{}' took {:.2f} seconds to parse {} packets.".format(name, t1-t0, COUNT))