1

I have a kernel module and a corresponding userspace module which use Netlink to communicate. The following code is used in the kernel module to send data to userspace:

int msglen = len - FRAME_PACKET_HEADER_SIZE;
struct sk_buff* skb = nlmsg_new(msglen, GFP_ATOMIC);
if (skb)
{
    struct nlmsghdr* nlh = nlmsg_put(skb, 0, 0, NLMSG_DONE, msglen, GFP_ATOMIC);
    nlh->nlmsg_flags = NLM_F_REQUEST;
    NETLINK_CB(skb).dst_group = 0;

    memcpy(nlmsg_data(nlh), &buf[FRAME_PACKET_HEADER_SIZE], msglen);
    status = nlmsg_unicast(mod_data->netlink_sock, skb, mod_data->netlink_pid);

}

During a period of high data activity (Netlink messages sent from kernel to userspace), nlmsg_new starts returning NULL and fails to allocate. The high data activity relates to a file transfer, which is pushed to userspace in 16k blocks. After some debugging, I found that when nlmsg_new fails, I can successfully allocate a message slightly smaller than what I actually need to allocate (Required size is 16336 bytes, an allocation of 16000 works).

Questions:

  1. The documentation I have read suggests that calling nlmsg_new for each message to be sent is correct. It seems that there is no way to reuse the sk_buff object because it may wait in a queue for a while and nlmsg_unicast handles deallocation when the message is actually sent (so no manual nlmsg_free is required). Is this definitely the case? Is there a way to possibly reuse the buffer allocated by nlmsg_new?

  2. I was wondering whether there are a bunch of ACK messages being queued up which are filling up some buffers somewhere. I set nlmsg_flags to NLM_F_REQUEST, so the userspace module should not be sending an ACK. Is this correct?

  3. Any other ideas why the allocation is failing?

For context, this is running on an embedded ARM with 256 MiB RAM. Kernel is 3.14.28.

The userspace module is constantly servicing the receive queue with a call to recvmsg so I don't think the receive buffer is becoming full. I think if this were the case, the kernel module would successfully allocate a buffer, but the call to nlmsg_unicast would return a -EAGAIN (which isn't happening).

EDIT: I saw a note in linux/netlink.h which was interesting:

/*
 *  skb should fit one page. This choice is good for headerless malloc.
 *  But we should limit to 8K so that userspace does not have to
 *  use enormous buffer sizes on recvmsg() calls just to avoid
 *  MSG_TRUNC when PAGE_SIZE is very large.
 */

so I tried reducing the message size so that a netlink message and header was < 8192 bytes but the same failure still occurred (just somewhat later, as expected).

EDIT 2: After looking at the reported available memory, it looks like the sk_buff objects allocated via nlmsg_new are never being released. It seems that they should be released when the reference count (users for sk_buff) becomes 0. Would there be any reason that Netlink is holding on to the buffer, waiting for an ACK which never arrives maybe?

trigger
  • 108
  • 7

0 Answers0