0

I am a developer/maintenance for a commercial network appliance product and this is a regarding a Customer issue. This is a C/C++ based application running on MontaVista and the process has 8 threads.

Two threads that are processing a Kerberos AP-REP packet using GssApi and Heimdal Krb5 API encountered a deadlock. The deadlock is between the Kerberos credential cache methods - krb5_cc_destroy and krb5_cc_cache_match.

Thread 5, executing krb5_cc_destroy->mcc_destroy - is owning a mutex and Thread 6 is waiting for it. Thread 6, executing krb5_cc_cache_match->krb5_cccol_cursor_next->mcc_get_cache_next - is owning a mutex and Thread 5 is waiting for it.

I am going over the krb5_cc* methods src code of the point of deadlock, but meanwhile also wanted help from this forum. I need someone to help to explain what is the purpose of these methods and how do I go about resolving this? Also, from what I searched, I could not find a known issue related to this. Any kind of help is welcome that can throw more clarity on this deadlock.

Below is a GDB stack trace of both these threads (thread 5 and 6 below)

(gdb) t 5
[Switching to thread 5 (Thread 0x7fc80afff700 (LWP 11229))]
#5  0x00007fc818319dfd in get_ccache (id=0x7fc7a2dea118, destroy=0x7fc7a2dea130, context=0x7fc7a1f5d680) at ntlm/kdc.c:108
108     in ntlm/kdc.c
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fc81440df1a in _L_lock_979 () from /lib64/libpthread.so.0
#2  0x00007fc81440dd6b in __GI___pthread_mutex_lock (mutex=0x7fc8128ddda0 <mcc_mutex>) at pthread_mutex_lock.c:64
#3  0x00007fc8126a48f4 in mcc_destroy (context=0x7fc7a1f5d680, id=<optimized out>) at mcache.c:228
#4  0x00007fc81267da7f in krb5_cc_destroy (context=0x7fc7a1f5d680, id=0x7fc7a36e9e80) at cache.c:644
#5  0x00007fc818319dfd in get_ccache (id=0x7fc7a2dea118, destroy=0x7fc7a2dea130, context=0x7fc7a1f5d680) at ntlm/kdc.c:108
#6  kdc_alloc (waas_user=0, ctx=0x7fc7a49bbc08, minor=0x7fc80afbdc4c) at ntlm/kdc.c:185
#7  kdc_alloc (minor=0x7fc80afbdc4c, ctx=0x7fc7a49bbc08, waas_user=<optimized out>) at ntlm/kdc.c:161
#8  0x00007fc818315db8 in _gss_ntlm_acquire_cred (min_stat=0x7fc80afbdc4c, desired_name=0x7fc7a5f1ad70, time_req=<optimized out>, desired_mechs=<optimized out>, cred_usage=<optimized out>,
    output_cred_handle=<optimized out>, actual_mechs=0x0, time_rec=0x7fc80afbdbd4) at ntlm/acquire_cred.c:80
#9  0x00007fc81830ceda in gss_acquire_cred (minor_status=0x7fc80afbdc4c, desired_name=0x7fc7a1051240, time_req=4294967295, desired_mechs=<optimized out>, cred_usage=2,
    output_cred_handle=0x7fc80afbdc50, actual_mechs=0x0, time_rec=0x0) at mech/gss_acquire_cred.c:165
#10 0x00007fc81831a38d in select_mech (minor_status=0x7fc8128ddda0 <mcc_mutex>, mechType=0x80, verify_p=0, mech_p=0xffffffffffffffff) at spnego/accept_sec_context.c:329
#11 0x0000000000000000 in ?? ()
(gdb) f 2
#2  0x00007fc81440dd6b in __GI___pthread_mutex_lock (mutex=0x7fc8128ddda0 <mcc_mutex>) at pthread_mutex_lock.c:64
64      pthread_mutex_lock.c: No such file or directory.
(gdb) p *mutex
$21 = {__data = {__lock = 2, __count = 0, __owner = 11230, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\000\000\000\000\336+\000\000\001", '\000' <repeats 26 times>, __align = 2}
(gdb)



(gdb) t 6
[Switching to thread 6 (Thread 0x7fc80a3ff700 (LWP 11230))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fc81440df1a in _L_lock_979 () from /lib64/libpthread.so.0
#2  0x00007fc81440dd6b in __GI___pthread_mutex_lock (mutex=0x7fc7a2dea9d8) at pthread_mutex_lock.c:64
#3  0x00007fc8126a443e in mcc_get_cache_next (context=0x7fc7a2fb4980, cursor=0x7fc7a3de46a8, id=0x7fc80a3bd9d8) at mcache.c:439
#4  0x00007fc81267e9cb in krb5_cccol_cursor_next (context=context@entry=0x7fc7a2fb4980, cursor=0x7fc7a6574970, cache=cache@entry=0x7fc80a3bd9d8) at cache.c:1459
#5  0x00007fc81267eeb9 in krb5_cc_cache_match (context=0x7fc7a2fb4980, client=0x7fc7a133abe0, id=0x7fc7a2133478) at cache.c:1146
#6  0x00007fc818319e10 in get_ccache (id=0x7fc7a2133478, destroy=0x7fc7a2133490, context=0x7fc7a2fb4980) at ntlm/kdc.c:110
#7  kdc_alloc (waas_user=0, ctx=0x7fc7a4bc8208, minor=0x7fc80a3bdc4c) at ntlm/kdc.c:185
#8  kdc_alloc (minor=0x7fc80a3bdc4c, ctx=0x7fc7a4bc8208, waas_user=<optimized out>) at ntlm/kdc.c:161
#9  0x00007fc818315db8 in _gss_ntlm_acquire_cred (min_stat=0x7fc80a3bdc4c, desired_name=0x7fc7a65749c0, time_req=<optimized out>, desired_mechs=<optimized out>, cred_usage=<optimized out>,
    output_cred_handle=<optimized out>, actual_mechs=0x0, time_rec=0x7fc80a3bdbd4) at ntlm/acquire_cred.c:80
#10 0x00007fc81830ceda in gss_acquire_cred (minor_status=0x7fc80a3bdc4c, desired_name=0x7fc7a128ca60, time_req=4294967295, desired_mechs=<optimized out>, cred_usage=2,
    output_cred_handle=0x7fc80a3bdc50, actual_mechs=0x0, time_rec=0x0) at mech/gss_acquire_cred.c:165
#11 0x00007fc81831a38d in select_mech (minor_status=0x7fc7a2dea9d8, mechType=0x80, verify_p=0, mech_p=0xffffffffffffffff) at spnego/accept_sec_context.c:329
#12 0x0000000000000000 in ?? ()
(gdb) f 2
#2  0x00007fc81440dd6b in __GI___pthread_mutex_lock (mutex=0x7fc7a2dea9d8) at pthread_mutex_lock.c:64
64      in pthread_mutex_lock.c
(gdb) p *mutex
$20 = {__data = {__lock = 2, __count = 0, __owner = 11229, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\000\000\000\000\335+\000\000\001", '\000' <repeats 26 times>, __align = 2}
(gdb)

---
  • _"what is the purpose of these methods"_ >> you should look into the `kinit`, `klist` and `kdestroy` Linux command-line utilities that manage Kerberos credentials in the local Kerberos cache. At least with MIT Kerberos implementation -- I'm not sure about Heimdal. – Samson Scharfrichter May 15 '20 at 12:00
  • Note that the CLI utilities imply a "server-wide" cache accessible to all processes owned by the same user (based on a filesystem e..g `FILE:`or on a system API e.g. `KEYRING:`). Maybe your code manages a private, in-memory cache that is local to the process. Different race conditions... – Samson Scharfrichter May 15 '20 at 12:10
  • Thanks Samson for your suggestions. I could see a GitHub entry in Heimdal where this defect is acknowledged.https://github.com/heimdal/heimdal/issues/432 – user2595010 May 19 '20 at 08:45

0 Answers0