3

I have collected a few mb of network traffic and want to run analysis on it. The problem that i am facing is that i want to store it in a manner such that i can reduce the time complexity when i search it.

The very first idea that i had in my mind was to put it into a database with all possible attribute in the columns , but then i realized that there would be lot of NULL values in the table, this slows the performance of the database. Even if the performance degradation is minor for a small database , when i will parse a large *.pcap file(greater than 1Gb) the degradation will drastically effect the performance of the database. AS the size of the database will increase and also the number of NULL values in the table.

So is there any better way to store the *.pcap file for analysis, I have been looking into XML tree as a solution but i'm not sure about it. I am using python to do it and i am using dpkt module to extract data from the *.pcap file

Thanks in advvance

thecreator232
  • 2,145
  • 1
  • 36
  • 51
  • What sort of performance are you getting parsing the pcap? what sort of performance are you hoping to achieve? – tMC Oct 03 '13 at 18:03
  • 1
    What are your plans with the parsed data? Do you want to browse it? Search for something? Filter out specific packets? – Milo Oct 03 '13 at 18:20
  • @Milo : I want to run search over the data to get the respective packets. and also filter specific packets. The main work is to run search over the data. – thecreator232 Oct 03 '13 at 18:23
  • @tMC : I haven't been able parse data into anything, as i don't have a proper way of storing it. So basically i just parsed a pcap file into a text file . So on the question of performance ,right now it sucks. – thecreator232 Oct 03 '13 at 18:29
  • I wrote an MPEG-2 Transport Stream analyser (TS files are huge) and my approach was to build a special map to the file. It was a simple list in memory that contained some basic information about the packets and parsing that was fast. And when I wanted thorough information I parsed the selected packet (and a few more that surrounded it) from the TS file. I think this can be applied here as well: e.g. if you want to filter out UDP or TCP packets then add that information to the map. It's just an idea. – Milo Oct 03 '13 at 18:48

1 Answers1

3

You can maybe do it in two steps:

First, use the tshark tool to convert pcap files into .csv ; for example:

tshark -r input_file.pcap -n -T fields -E separator=, -e frame.time -e ip.src -e ip.dst -e ip.proto -e tcp.port >outfile.csv

Then use the csv module in Python to read it and do your analysis.

 import csv
 with open("outfile.csv", "r") as f:
     reader = csv.reader(f, delimiter=",")
     for row in reader:
         # do whatever with the data row...

Hope this helps.

mguijarr
  • 7,641
  • 6
  • 45
  • 72