12

Hypothetical scenario: A udp packet stream arrives at machine X, which is running two programs - one which is listening for the packets with recv(), and another which is running pcap.

In this case, as I understand it, the packets are stored in the interface until it is polled by the kernal, which then moves them into a buffer in the kernals memory, and copies the packets into another two buffers - one buffer for the program listening with recv, and one buffer for the program listening with pcap. The packets are removed from the respective buffer when they are read - either by pcap_next() or recv(), the next time the process scheduler runs them (I assume they are blocking in this case). Is this correct? Are there really 4 buffers used, or is it handled some other way?

I'm looking for a description, as detailed as possible, as to what buffers are really involved in this case, and how packets move from one to the other (e.g. does a packet get copied to pcaps buffer before it goes to the recv buffer, after, or undefined?).

I know this seems like a big question, but all I really care about is where the packet gets stored, and how long it stays there for. Bullet points are fine. Ideally I'd like a general answer, but if it varies between OS I'm most interested in Linux.

Benubird
  • 18,551
  • 27
  • 90
  • 141

1 Answers1

8

Linux case (BSD's are probably somewhat similar, using mbufs instead of skbuffs):

Linux uses skbuffs (socket buffers) to buffer network data. A skbuff has metadata about some network data, and some pointers to that data.

Taps (pcap users) create clones of skbuffs. A clone is a new skbuff, but it points to the same data. When someone needs to modify data shared by several skbuffs (the original skbuff and its clones), it first needs to create a fresh copy (copy-on-write).

When someone doesn't need an skbuff anymore, it kfree_skb()'s it. kfree_skb() decrements a reference count, and when that reference count reaches zero, the skbuff is freed. It's slightly more complicated to account for clones, but this is the general idea.

ninjalj
  • 42,493
  • 9
  • 106
  • 148
  • So, you're saying that the recv() function uses the kernels buffer, but each pcap instance has it's own copy of the buffer? At what stage are these clones made - when the packet is received? when pcap wants to read it? – Benubird Feb 24 '11 at 09:22
  • Each pcap instance has its own copy of the metadata, but they all share the same data. The clones are made when the packet is received (in `skb_deliver()` IIRC). – ninjalj Feb 24 '11 at 18:47