Efficient Searching Algorithm for Capture File

Question

I am currently developing a tool in java that will help track and interpret data being sent across an ethernet connection. I have already successfully developed both the packet sniffer and the packet data interpreter.

I run into a problem when trying to navigate to specific packets within the trace file. Each packet has an associated time stamp, and I would like to be able to navigate to a specific time window. My current method to do this is below.

public ArrayList<Packet> getTimeWindow(double time, int window) {
    ArrayList<Packet> packets = new ArrayList<Packet>();
    double start = time - window;
    double end = time + window;

    JpcapCaptor captor = null;
    try {
        captor = JpcapCaptor.openFile(this.traceFile); 
    } catch (IOException e) {e.printStackTrace();}

    Packet p = captor.getPacket();
    while(packet != null) {
        if(f.timestamp > end) return packets;
        if(p.timestamp >= start) packets.add(p);    
        packet=captor.getPacket();
    }
    return packets;
}

This works fine for small traces, but can get pretty slow when we're dealing with millions of packets. I would like to implement some form of binary search algorithm, but I can't figure out a way to navigate to the middle of the packets without preprocessing them. The packets are not neatly organized by line, and even if I jump to a random point in the file, I can't guarantee I'm at the start of a packet.

In summary: I am looking to develop an efficient way to search for a specific packet in a capture (.pcap or .cap) file. I've scoured the net, and I haven't been able to find anything that can do quite what I'm asking.

If anyone has any ideas / solutions you could suggest, it would be greatly appreciated.

Thanks!

Not sure if it's viable, but could you use something like a ResultSet? Have it read in, process, then throw away the local data, and then only store a reference to it's location (in say a Map) that you can then jump through to find what you're looking for? This way your binary search will be over a well-ordered keySet. You'll still have overhead when initially populating the map, but it won't be as bad as trying to hold it all in memory. — Charles, Jun 19 '12 at 20:29
That's actually not a bad idea for post processing. I might consider doing this if I can't figure out a way to do it through the single trace file. I'm trying to minimize post processing time, but that's just a want, not a need. I'll let you know if I end up doing this, thanks! — Rob Wagner, Jun 19 '12 at 20:37

score 1 · Accepted Answer · answered Jun 20 '12 at 01:33

1

An easy, small solution is to build a simple index for the files in question. For instance, you can record the offset in the file of the start of every 1000th packet. Store this information (just a sequence of 64-bit indexes into the original trace file) in a small index file. Then when you're doing binary search, you can use this index, together with the original file, to find (within 1000 packets) the correct point to start reading.

Of course, this requires preprocessing (or processing while generating) the trace files.

answered Jun 20 '12 at 01:33

Keith Randall

22,985
2
35
54

I like this idea, but processing while generating the trace is out of the question, thank you though! – Rob Wagner Jun 20 '12 at 17:54
Well, here's the description of the file format Jpcap uses, might be able to detect the packet headers somehow (randomly seek, find something that looks like a header, advance the size the header says, and see if another header is there. Repeat a few times to be sure): http://www.manpagez.com/man/5/pcap-savefile/ – Keith Randall Jun 21 '12 at 00:38
I accepted this answer, because your comment lead me to my eventual answer. I had to do some fancy footwork, but I eventually got a working solution to my problem. Thanks. – Rob Wagner Jul 09 '12 at 21:44

Justin · Answer 2 · 2012-06-19T23:18:06.363

This is just a guess but maybe a Interval Tree or a Segment Tree would be a good choice. Assuming you can fit all the packets into memory. Interval Trees are fairly easy to create if you follow the Cormen et al algorithm. Segment Tree can be more expensive in terms of memory but should give you faster stabbing queries.

If the packets won't fit into memory. You could use the capture file timestamps as the most broad interval and drill down into each file if someone navigates to that interval.

Efficient Searching Algorithm for Capture File

2 Answers2