How does pcap unix buffering work?

Question

Hypothetical scenario: A udp packet stream arrives at machine X, which is running two programs - one which is listening for the packets with recv(), and another which is running pcap.

In this case, as I understand it, the packets are stored in the interface until it is polled by the kernal, which then moves them into a buffer in the kernals memory, and copies the packets into another two buffers - one buffer for the program listening with recv, and one buffer for the program listening with pcap. The packets are removed from the respective buffer when they are read - either by pcap_next() or recv(), the next time the process scheduler runs them (I assume they are blocking in this case). Is this correct? Are there really 4 buffers used, or is it handled some other way?

I'm looking for a description, as detailed as possible, as to what buffers are really involved in this case, and how packets move from one to the other (e.g. does a packet get copied to pcaps buffer before it goes to the recv buffer, after, or undefined?).

I know this seems like a big question, but all I really care about is where the packet gets stored, and how long it stays there for. Bullet points are fine. Ideally I'd like a general answer, but if it varies between OS I'm most interested in Linux.

score 8 · Accepted Answer · answered Feb 23 '11 at 20:42

Linux case (BSD's are probably somewhat similar, using mbufs instead of skbuffs):

Linux uses skbuffs (socket buffers) to buffer network data. A skbuff has metadata about some network data, and some pointers to that data.

Taps (pcap users) create clones of skbuffs. A clone is a new skbuff, but it points to the same data. When someone needs to modify data shared by several skbuffs (the original skbuff and its clones), it first needs to create a fresh copy (copy-on-write).

When someone doesn't need an skbuff anymore, it kfree_skb()'s it. kfree_skb() decrements a reference count, and when that reference count reaches zero, the skbuff is freed. It's slightly more complicated to account for clones, but this is the general idea.

So, you're saying that the recv() function uses the kernels buffer, but each pcap instance has it's own copy of the buffer? At what stage are these clones made - when the packet is received? when pcap wants to read it? — Benubird, Feb 24 '11 at 09:22
Each pcap instance has its own copy of the metadata, but they all share the same data. The clones are made when the packet is received (in `skb_deliver()` IIRC). — ninjalj, Feb 24 '11 at 18:47

How does pcap unix buffering work?

1 Answers1