10

I know that information exchange can happen via following interfaces between kernel and user space programs

  • system calls

  • ioctls

  • /proc & /sys

  • netlink

I want to find out

  • If I have missed any other interface?

  • Which one of them is the fastest way to exchange large amounts of data? (and if there is any document/mail/explanation supporting such a claim that I can refer to)

  • Which one is the recommended way to communicate? (I think its netlink, but still would love to hear opinions)

Methos
  • 13,608
  • 11
  • 46
  • 49

6 Answers6

7

The fastest way to exchange vast amount of data is memory mapping. The mmap call can be used on a device file, and the corresponding kernel driver can then decide to map kernel memory to user address space. A good example of this is the Video For Linux drivers, and I suppose the frame buffer driver works the same way. For an good explanation of how the V4L2 driver works, you have :

You can't beat memory mapping for large amount of data, because there is no memcopy like operation involved, the physical underlying memory is effectively shared between kernel and userspace. Of course, like in all shared memory mechanism, you have to provide some synchronisation so that kernel and userspace don't think they have ownership at the same time.

shodanex
  • 14,975
  • 11
  • 57
  • 91
6

Shared Memory between kernel and usespace is doable.

http://kerneltrap.org/node/14326

For instructions/examples.

You can also use a named pipe which are pretty fast.

All this really depends on what data you are sharing, is it concurrently accessed and what the data is structured like. Calls may be enough for simple data.

Linux kernel /proc FIFO/pipe

Might also help

good luck

Community
  • 1
  • 1
Aiden Bell
  • 28,212
  • 4
  • 75
  • 119
5

You may also consider relay (formerly relayfs):

"Basically relayfs is just a bunch of per-cpu kernel buffers that can be efficiently written into from kernel code. These buffers are represented as files which can be mmap'ed and directly read from in user space. The purpose of this setup is to provide the simplest possible mechanism allowing potentially large amounts of data to be logged in the kernel and 'relayed' to user space."

http://relayfs.sourceforge.net/

Dénes Tarján
  • 576
  • 2
  • 4
  • 16
2

You can obviously do shared memory with copy_from_user etc, you can easily set up a character device driver basically all you have to do is make a file_operation structures but this is by far not the fastest way. I have no benchmarks but system calls on moderns systems should be the fastest. My reasoning is that its what's been most optimized for. It used to be that to get to from user -> kernel one had to create an interrupt, which would then go to the Interrupt table(an array) then locate the interrupt handlex(0x80) and then go to kernel mode. This was really slow, and then came the .sysenter instruction, which basically makes this process really fast. Without going into details, .sysenter reads form a register CS:EIP immediately and the change is quite fast. Shared memory on the contrary requires writing to and reading from memory, which is infinitely more expensive than reading from a register.

daniel
  • 9,732
  • 7
  • 42
  • 57
  • 1
    Surely you meant "several orders of magnitude" instead of "infinitely" [/nitpick] – Piskvor left the building Jun 03 '09 at 07:53
  • 2
    a system call still requires a context switch and saving/restoring registers, regardless of whether you `int` or `sysenter`. shmem writes go into the CPU cache, not directly to memory, so are fast unless cache misses. – Peter Cordes Dec 09 '09 at 19:55
1

Here is a possible compilation of all the possible interface, although in some ways they overlapped one another (eg, socket and system call are both effectively using system calls):

Procfs
Sysfs
Configfs
Debugfs
Sysctl
devfs (eg, Character Devices) 
TCP/UDP Sockets
Netlink Sockets 
Ioctl
Kernel System Calls
Signals
Mmap
Peter Teoh
  • 6,337
  • 4
  • 42
  • 58
1

As for shared memory , I've found that even with NUMA the two thread running on two differrent cores communicate through shared memory still required write/read from L3 cache which if lucky (in one socket)is about 2X slower than syscall , and if(not on one socket ),is about 5X-UP slower than syscall,i think syscall's hardware mechanism helped.

bing zhu
  • 11
  • 4