3

I'm trying to code some basic kernel module - userspace program communication using netlink sockets (libnl on user side). Userspace program sends a message to kernel and expects a reply. Unfortunately, receiving reply fails with return value -16 (EBUSY).

Interestingly enough, when I receive data from netlink socket directly, using standard system call recv on nl_socket_get_fd(sock), everything works fine!

Does anyone have an idea why this is happening?

Here is the userspace code (parse_cb is a callback that doesn't get invoked):

struct nl_sock *sock;
struct nl_msg *msg;
int family, res;

// Allocate a new netlink socket
sock = nl_socket_alloc();

// Connect to generic netlink socket on kernel side
genl_connect(sock);

// Ask kernel to resolve family name to family id
family = genl_ctrl_resolve(sock, PSVFS_FAMILY_NAME);

// Construct a generic netlink by allocating a new message, fill in
// the header and append a simple integer attribute.
msg = nlmsg_alloc();
genlmsg_put(msg, NL_AUTO_PID, NL_AUTO_SEQ, family, 0, NLM_F_ECHO,
        PSVFS_C_INIT, PSVFS_VERSION);
nla_put_string(msg, PSVFS_A_MSG, "stuff");

// Send message over netlink socket
nl_send_auto_complete(sock, msg);

// Free message
nlmsg_free(msg);

nl_socket_modify_cb(sock, NL_CB_VALID, NL_CB_CUSTOM, parse_cb, NULL);

res = nl_recvmsgs_default(sock);
printf("After receive %i.\n", res);

Here is the kernel-side callback for messsage sent by userspace program (this one gets invoked properly):

int psvfs_vfs_init(struct sk_buff *skb2, struct genl_info *info) {
    send_to_daemon("VFS initialized.", PSVFS_C_INIT, info->snd_seq+1, info->snd_pid);

    return 0;
}

And here is the 'send_to_daemon' function:

int send_to_daemon(char* msg, int command, int seq, u32 pid) {
    int res = 0;
    struct sk_buff* skb;
    void* msg_head;

    skb = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
    if (skb == NULL) {
        res = -ENOMEM;
        goto out;
    }

    msg_head = genlmsg_put(skb, 0, seq, &psvfs_gnl_family, 0, command);
    if (msg_head == NULL) {
        res = -ENOMEM;
        goto out;
    }

    res = nla_put_string(skb, PSVFS_A_MSG, msg);
    if (res != 0)
        goto out;

    genlmsg_end(skb, msg_head);

    res = genlmsg_unicast(&init_net, skb, pid);
    if (res != 0)
        goto out;

  out:
    return res;
}
ghik
  • 10,706
  • 1
  • 37
  • 50

1 Answers1

3

OK, I found what was wrong here.

I finally found out that libnl functions have their own error codes, different from standard POSIX return codes and -16 stands for NLE_SEQ_MISMATCH.

The problem was caused by bad sequence numbers that I assigned to my messages.

ghik
  • 10,706
  • 1
  • 37
  • 50
  • This is a tad late but I am having the same problem, and no one on the internet updates libnl documentation.... How did you fix it? My callback function isn't being called at all, and I get -16 errors. I am assigning a sequence number of "0" to all of them because I don't care about sending messages in sequence. If I call "nl_socket_disable_seq_check", my callback function is still not called but now the "nl_recvmsgs_default" only returns 0. – Chris Apr 12 '12 at 16:14
  • Check the correctness of the policy. When you allocate memory for policy it may store some garbage and some fields of netlink structures may be initialized by wrong values. Also check that you set callback handler to all messages. Cause you may set handler to incorrect type. – t0k3n1z3r Oct 15 '13 at 06:23