I'm interested in using netlink for a straightforward application (reading cgroup stats at high frequency).
The man page cautions that the protocol is not reliable, hinting that the application needs to be prepared to handle dropped packets:
However, reliable transmissions from kernel to user are impossible in any case. The kernel can't send a netlink message if the socket buffer is full: the message will be dropped and the kernel and the user-space process will no longer have the same view of kernel state. It is up to the application to detect when this happens (via the
ENOBUFS
error returned byrecvmsg(2)
) and resynchronize.
Since my requirements are simple, I'm fine with just destroying the socket and creating a new one whenever anything unexpected happens. But I can't find any documentation on what the expectations are on my program—the man page for recvmsg(2)
doesn't even mention ENOBUFS
for example.
What all do I need to worry about in order to make sure I can tell that a request from my application or a response from the kernel has been dropped, so that I can reset everything and start over? It's clear to me that I could do so whenever I receive an error from any of the syscalls involved, but for example what happens if my request is dropped on the way to the kernel? Will I just never receive a response? Do I need to build a timeout mechanism where I wait only so long for a response?