I have a kernel module and a corresponding userspace module which use Netlink to communicate. The following code is used in the kernel module to send data to userspace:
int msglen = len - FRAME_PACKET_HEADER_SIZE;
struct sk_buff* skb = nlmsg_new(msglen, GFP_ATOMIC);
if (skb)
{
struct nlmsghdr* nlh = nlmsg_put(skb, 0, 0, NLMSG_DONE, msglen, GFP_ATOMIC);
nlh->nlmsg_flags = NLM_F_REQUEST;
NETLINK_CB(skb).dst_group = 0;
memcpy(nlmsg_data(nlh), &buf[FRAME_PACKET_HEADER_SIZE], msglen);
status = nlmsg_unicast(mod_data->netlink_sock, skb, mod_data->netlink_pid);
}
During a period of high data activity (Netlink messages sent from kernel to userspace), nlmsg_new
starts returning NULL
and fails to allocate. The high data activity relates to a file transfer, which is pushed to userspace in 16k blocks. After some debugging, I found that when nlmsg_new
fails, I can successfully allocate a message slightly smaller than what I actually need to allocate (Required size is 16336 bytes, an allocation of 16000 works).
Questions:
The documentation I have read suggests that calling
nlmsg_new
for each message to be sent is correct. It seems that there is no way to reuse thesk_buff
object because it may wait in a queue for a while andnlmsg_unicast
handles deallocation when the message is actually sent (so no manualnlmsg_free
is required). Is this definitely the case? Is there a way to possibly reuse the buffer allocated bynlmsg_new
?I was wondering whether there are a bunch of ACK messages being queued up which are filling up some buffers somewhere. I set
nlmsg_flags
toNLM_F_REQUEST
, so the userspace module should not be sending an ACK. Is this correct?Any other ideas why the allocation is failing?
For context, this is running on an embedded ARM with 256 MiB RAM. Kernel is 3.14.28.
The userspace module is constantly servicing the receive queue with a call to recvmsg
so I don't think the receive buffer is becoming full. I think if this were the case, the kernel module would successfully allocate a buffer, but the call to nlmsg_unicast
would return a -EAGAIN
(which isn't happening).
EDIT:
I saw a note in linux/netlink.h
which was interesting:
/*
* skb should fit one page. This choice is good for headerless malloc.
* But we should limit to 8K so that userspace does not have to
* use enormous buffer sizes on recvmsg() calls just to avoid
* MSG_TRUNC when PAGE_SIZE is very large.
*/
so I tried reducing the message size so that a netlink message and header was < 8192 bytes but the same failure still occurred (just somewhat later, as expected).
EDIT 2:
After looking at the reported available memory, it looks like the sk_buff
objects allocated via nlmsg_new
are never being released. It seems that they should be released when the reference count (users
for sk_buff
) becomes 0. Would there be any reason that Netlink is holding on to the buffer, waiting for an ACK which never arrives maybe?