I have worked with network programming before. But this is my first foray into netlink sockets.
I have chosen to study the 'connector' type of netlink sockets. As with any kernel component, it has a user counterpart as well. The Linux kernel has a sample program called ucon.c which can be used to build userspace programs based on the aforementioned connector netlink sockets.
So here I wish to pin-point parts of the program that I want to confirm my understanding of and of parts of the program that I do not follow the logic of.
As far as I have understood, netlink sockets are a IPC method used to connect processes on the same machine and hence process ID is used as an identifier. And since netlink messages can be ideally multicast, another identifier that is needed by the netlink socket is the message group. All components that are connected to the same message group are in fact related. So while in case of IPv4, we use a sockaddr_in in place of the sockaddr, here we use a sockaddr_nl which contains the above mentioned identifiers.
Now, since we are not going to use the TCP/IP stack of the kernel, in case of netlink messages, netlink packets can be considered to be raw. Hence the only encapsulation that the netlink packet goes through is the netlink message header defined as nlmsghdr.
Now coming on to our program ucon, main()
first creates a NETLINK family socket with the connector protocol. Then it fills up the aforementioned netlink socketaddress structure with the relevant information. In order to be a little experimental here, I have added an entry in the connector.h file. Now here comes my first question.
A connector message has a certain type defined in connector.h. Now this connector message structure is something that is completely internal to netlink right? As in, as far as netlink is concerned, this is all but payload. Right?
Moving on, what exactly does the nl-group field mean within the netlink message header structure? The definition does not really contain an element of this name. So are we using overlay techniques to fill certain fields of the netlink message header? And if so, what exactly is the correspondence? I cannot seem to find it anywhere.
So after binding the socket address to the socket, it is sending 10,000 unique pieces of connector based data, which as far as netlink is concerned, is pure payload. But what is strange as far as these messages are concerned is, that all of them seem to have the same sequence number.
Moving on, we find ourselves in the netlink_send subroutine to send these packets via the socket that we are bound to above. This subroutine uses a variety of netlink helper macros to manipulate the data to send. As we say above, the main()
function sends 10,000 pieces of data, each of whom is zero-length and requires no acknowledgement, since the ack field is 0. So each 'packet' is nothing but a connector message header without anything in it. Right?
Now what is surprising is that the netlink_Send function uses the same sequence number as the main() since it is a global variable. However, after the post increment in main(), it is now '1'. So basically our netlink talk is starting with a sequence number of '1'. Is that fine?
Looking into some of the helper macros defined in linux/netlink.h, I will try to summarize my understanding of the ones that are directly or indirectly being used in this program.
#define NLMSG_LENGTH(len) ((len)+NLMSG_ALIGN(NLMSG_HDRLEN))
So this macro will first align the netlink message header length and then add the payload length to it. For our case the netlink payload is a connector header without any payload of its own. In our case, this micro is used like so
nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
Here, what I do not understand is the actual payload of the netlink message. In the above case, it is the size of the connector message header (since the connector message itself contains no payload of its own) minus the pointer (which is pointing to the first byte of the netlink message and thereby the netlink message header). And this pointer is (like any other pointer variable) equal to the machine word size which in my case is 4 bytes. Why are we subtracting this from the connector message header?
After that, we send the message over this netlink socket just like any other IPv4 socket.