1

Our mongodb hangs several times in the production, and it seems to me there is a deadlock.

Using gdb we can see most of threads hang at pthread_cond_timedwait.

Is there a way to figure out which thread hold the lock?

(gdb) bt full
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x0000000000bba269 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
No symbol table info available.
#2  0x0000000000bbdd37 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
No symbol table info available.
#3  0x0000000000bb3ff9 in mongo::Lock::ResourceLock::lock(mongo::LockMode) ()

(gdb) info args
No symbol table info available.
(gdb) info locals
No locals.
(gdb) info registers
rax            0xfffffffffffffdfc   -516
rbx            0x7fb988055700   140434827728640
rcx            0xffffffffffffffff   -1
rdx            0x35a49  219721
rsi            0x189    393
rdi            0x636d752d4  26689884884
rbp            0x0  0x0
rsp            0x7fb988054a40   0x7fb988054a40
r8             0x636d752a8  26689884840
r9             0xffffffff   4294967295
r10            0x7fb988052990   140434827717008
r11            0x206    518
r12            0x0  0
r13            0x7fb9c9728060   140435925401696
r14            0xb9b0   47536
r15            0x7fb988055700   140434827728640
rip            0x7fb9c80a787d   0x7fb9c80a787d <clone+109>
eflags         0x206    [ PF IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0
bydsky
  • 1,604
  • 2
  • 14
  • 30
  • You'd have an easier time with a build that has debug symbols. It can still be optimized, just use `-g` along with other build options, and don't strip the binary. `gcc -O3 -g`. It will still run at full speed. You may still need to look at the asm to figure out which register or memory location to check. (See the bottom of the [x86 tag wiki](http://stackoverflow.com/tags/x86/info) for asm-debugging tips) – Peter Cordes Sep 14 '16 at 22:42
  • Actually, it would probably be a lot easier to use an instrumented pthreads library that records which thread took the lock. IDK if one exists, or if you can maybe affect glibc's behaviour with environment variables. (The identity of the locking thread would probably isn't normally recorded anywhere, since that's only needed for debugging.) – Peter Cordes Sep 14 '16 at 22:45
  • See https://stackoverflow.com/questions/3483094/is-it-possible-to-determine-the-thread-holding-a-mutex, but you might just get thread 0, see https://stackoverflow.com/questions/6697058/pthread-mutex-lock-locks-but-no-owner-is-set – Tor Klingberg Feb 22 '18 at 17:16

0 Answers0