2

My code is using ctl_enqueuedata for kernel-user communication.

I notice that SOMETIMES (I cannot really reproduce it) - I got crash inside of ctl_enqueuedata

When I connect using debugger, the backtrace is the following

frame #0: 0xffffff80248bcecb mach_kernel`Debugger(message=<unavailable>) + 555 at model_dep.c:912
frame #1: 0xffffff802481d636 mach_kernel`panic(str=<unavailable>) + 198 at debug.c:336
frame #2: 0xffffff8024b4e45f mach_kernel`kauth_cred_unref_hashlocked(credp=0xffffff8035ca0d58) + 47 at kern_credential.c:4470
frame #3: 0xffffff8024b4cf7d mach_kernel`kauth_cred_unref(credp=<unavailable>) + 29 at kern_credential.c:4521
* frame #4: 0xffffff8024b9e585 mach_kernel`sodealloc(so=0xffffff8035ca0b80) + 21 at uipc_socket.c:710
frame #5: 0xffffff8024b59942 mach_kernel`ctl_unlock [inlined] ctl_sofreelastref + 354 at kern_control.c:263
frame #6: 0xffffff8024b598be mach_kernel`ctl_unlock(so=<unavailable>, refcount=<unavailable>, lr=<unavailable>) + 222 at kern_control.c:1076
frame #7: 0xffffff8024b58ebd mach_kernel`ctl_enqueuedata(kctlref=<unavailable>, unit=<unavailable>, data=<unavailable>, len=<unavailable>, flags=<unavailable>) + 301 at kern_control.c:549
frame #8: 0xffffff7fa6090efd

It seems that the socket credentials are zero.

How can it be. Is it kernel bug or I misuse ctl_enqueuedata?

pmdj
  • 22,018
  • 3
  • 52
  • 103
Georgy Buranov
  • 1,296
  • 1
  • 16
  • 26

1 Answers1

1

The thing that strikes me as odd in this trace is that the socket is being destroyed (sodealloc) at the end of ctl_enqueuedata. This isn't what I'd expect in normal operation.

Could it be that you have a race condition between your socket's ctl_disconnect_func/ctl_disconnect callback being called and calling ctl_enqueuedata()? Once your disconnect callback fires you should be making sure that no new data is enqueued. Also, you should ensure that all enqueueing operations have completed by the time you return from the disconnect callback. In practice, this means you'll need to hold a lock while enqueueing and also acquire that lock in disconnect, while you change your data structure to deregister the connection.

If you've verified that that's definitely not the problem in your case: what kernel version is this? I'm having a hard time reconciling the line numbers properly.

pmdj
  • 22,018
  • 3
  • 52
  • 103
  • YES! I am also thinking That the client is disconnecting and kernel is sending data using ctl_enqueuedata at that time. But I have tried to create a lock using lck_rw_lock_exclusive(dlist.read_write_lock) in ctl_disconnect_func and lck_shared_lock before ctl_enqueuedata. – Georgy Buranov Mar 31 '14 at 08:18
  • I am not sure if it really works but I noticed that https://developer.apple.com/library/mac/documentation/Kernel/Reference/kern_control_header_reference/Reference/reference.html this helps says "The ctl_disconnect_func is used to receive notification that a client has disconnected from the kernel control." If this is correct, then locking will not help - the client is ALREADY disconnected and we just got notification about it – Georgy Buranov Mar 31 '14 at 08:19
  • Also, what will be if the user space process is just crashed? (that is also possible in my case). In this case no matter what kind of lock do we have in ctl_disconnect_func - the user space DOES NOT EXIST – Georgy Buranov Mar 31 '14 at 11:59
  • Looking over [the kern_control.c source](http://opensource.apple.com/source/xnu/xnu-2422.1.72/bsd/kern/kern_control.c) again, in particular the function `int ctl_disconnect(struct socket *so)`, which is called when userspace disconnects (via `sodisconnectlocked()` in [uipc_socket.c](https://www.opensource.apple.com/source/xnu/xnu-792.10.96/bsd/kern/uipc_socket.c)) the socket locking looks OK to me. I still think there's a race condition in your code between the disconnect function and the enqueuedata. I might be wrong of course, but that seems by far the most likely explanation. – pmdj Apr 05 '14 at 08:25
  • This was a kernel bug "Georgy, This is a status update on a bug report that you filed: Bug ID 16471801 - Random kernel crashes in ctl_enqueuedata when the user space is disconnected A fix for this issue is in development. We will follow up with you again when it is available. " – Georgy Buranov Apr 15 '14 at 12:00
  • @GeorgyBuranov nice find! – pmdj Apr 16 '14 at 00:48