From the kernel to the user space (DMA)

Question

Lately, I have been reading a lot of websites,and books about 10gb/s NICs, their DMA and the way data are handled by the linux kernel (10/100 mb/s NICs) and a few questions came to my mind.

What would be the easiest way to send a 10GB/s flow of data from the NIC to the user-land (I assume being able to process the data in the user-land at the same rate).

And

Do you think it would be a good idea to implement the DMA buffer inside the user-space to read the raw data directly from there (and process them obviously at the same rate)

or is their any better solutions I didn't think of :/

Thank you.

score 6 · Answer 1 · answered Jun 21 '12 at 13:24

6

The easiest thing it to use Linux's normal sockets. It might not be the most efficient, but it's easiest.

There are frameworks, which allow very efficiently to receive and transmit data in user space. They map the same buffers to the NIC (DMA) and the process, so data doesn't need to be copied.
These frameworks bypass the kernel completely - you have to interact with the NICs directly. Such frameworks are, for example, PF-RING and Netmap

answered Jun 21 '12 at 13:24

ugoren

16,023
3
35
65

I need something more efficient than the Linux normal sockets, I had read a bit about PF-RING and Netmap, I'll try to read a bit more about that today, thanks for the answer. However, I'm not sure which solution (PF-RING, Netmap of DMA in user-land) , is the fastest though – AdventN Jun 21 '12 at 14:14
I have been reading about PF-RING and Netmap, as well as PACKET_MMAP, and they seem to have almost all the same solution there, I've also looked up the slides from Van Jacobson, and they had very good ideas inside ! I guess, I will have a first try with PACKET_MMAP, and then make a small bench test with PF-RING, to see what solution seem to be the most appropriate. But I wonder if it would be to possible to stop the packets to enter the sk_buff, and directly be treated by a ring buffer in the User-land or by PF-RING (and how difficult it would be to implement that from scratch) – AdventN Jun 25 '12 at 13:51

pierigno · Answer 2 · 2012-07-17T16:32:54.967

2

I would also suggest to take a look at the PFQ framework ( https://github.com/pfq/PFQ and http://netgroup.iet.unipi.it/software/pfq/), which build on top of netmap and pf_ring concepts and hides them to allow simple multi-core packet processing in user space.

edited Jul 17 '12 at 16:32

answered Jul 17 '12 at 08:15

pierigno

21
2

I'll have a look into that as well, thanks for the reply. at the moment I just investigated the PF_Ring solution proposed earlier. – AdventN Aug 22 '12 at 11:29

score 0 · Answer 3 · answered Jul 17 '12 at 08:40

I remember hearing a talk by some folks from Intel in Ottawa Linux Symposium that implemented exactly what you proposed. When they measured the results vs. the normal socket interface they were surprised to discover that for many work loads this approach did not perform any better and sometime worse(!) then the socket interface.

I searched but could not locate the exact paper online right now, but maybe this gives you a hint...

From the kernel to the user space (DMA)

3 Answers3