I can use !foreachframe
and !if
in MEX Debugging Extension for WinDbg to grep the callstack and execute a command (multiple commands separated by ; (Command Separator) are not supported) to find the thread who is not watting for the lock but whose previous call is Post
. The extension can be downloaded from here. After downloading, it can be put in C:\WinDDK\7600.16385.1\Debuggers\winext
(see also Loading Debugger Extension DLLs).
I replaced MSVCP140!mtx_do_lock
with msvcrt!_threadstartex
and replaced AZSDK!AZConnection::Post
with KERNEL32!BaseThreadInitThunk
in the following code as an example:
~*e r @$t0 = -1; !foreachframe -q -f 'KERNEL32!BaseThreadInitThunk' r @$t0= @#FrameNum - 1; .if(0<=@$t0) { !if -DoesNotContainRegex 'msvcrt!_threadstartex' -then '.printf /D "Thread: <link cmd=\"~~[%x]\">0x%x</link> (<link cmd=\"!mex.t %d\">%d</link>)", $tid, $tid, $dtid, $dtid' .frame @$t0 }
!foreachthread -q !foreachframe -q -f 'KERNEL32!BaseThreadInitThunk' !if -DoesNotContainRegex 'msvcrt!_threadstartex' -then '.printf /D "Thread: <link cmd=\"~~[%x]\">0x%x</link> (<link cmd=\"!mex.t %d\">%d</link>)", $tid, $tid, $dtid, $dtid' .frame @#FrameNum - 1
Its output example for a notepad
process:
Thread: 0xd14c (0)
Thread: 0x4f88 (1)
Thread: 0xd198 (7)
Callstacks of All threads
0:001> !foreachthread k
Child-SP RetAddr Call Site
00000062`fabaf8c8 00007ffd`54c7409d win32u!ZwUserGetMessage+0x14
00000062`fabaf8d0 00007ff7`bb4c449f USER32!GetMessageW+0x2d
00000062`fabaf930 00007ff7`bb4dae07 notepad+0x449f
00000062`fabafa30 00007ffd`570f7974 notepad+0x1ae07
00000062`fabafaf0 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fabafb20 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Changing to thread: 0xda94 (1)
Child-SP RetAddr Call Site
00000062`fae7fbc8 00007ffd`5847f01b ntdll!DbgBreakPoint
00000062`fae7fbd0 00007ffd`570f7974 ntdll!DbgUiRemoteBreakin+0x4b
00000062`fae7fc00 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fae7fc30 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Changing to thread: 0xdb60 (7)
Child-SP RetAddr Call Site
00000062`fb17f968 00007ffd`54c7004d win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14
00000062`fb17f970 00007ffc`e4e7d078 USER32!MsgWaitForMultipleObjectsEx+0x9d
00000062`fb17f9b0 00007ffc`e4e7cec2 DUser!GetMessageExA+0x2f8
00000062`fb17fa50 00007ffd`54c77004 DUser!GetMessageExA+0x142
00000062`fb17fab0 00007ffd`584534a4 USER32!Ordinal2582+0x64
00000062`fb17fb50 00007ffd`54101164 ntdll!KiUserCallbackDispatcher+0x24
00000062`fb17fbc8 00007ffd`54c7409d win32u!ZwUserGetMessage+0x14
00000062`fb17fbd0 00007ffd`2e4efa3c USER32!GetMessageW+0x2d
00000062`fb17fc30 00007ffd`1d0b30f8 DUI70!StartMessagePump+0x3c
00000062`fb17fc90 00007ffd`1d0b31ce msctfuimanager!DllCanUnloadNow+0xf3e8
00000062`fb17fd50 00007ffd`570f7974 msctfuimanager!DllCanUnloadNow+0xf4be
00000062`fb17fd80 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fb17fdb0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Changing to thread: 0xc490 (8)
Child-SP RetAddr Call Site
00000062`fb1ff708 00007ffd`54c7004d win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14
00000062`fb1ff710 00007ffc`e4e7d1ca USER32!MsgWaitForMultipleObjectsEx+0x9d
00000062`fb1ff750 00007ffc`e4e7cde7 DUser!GetMessageExA+0x44a
00000062`fb1ff7f0 00007ffc`e4e7ca53 DUser!GetMessageExA+0x67
00000062`fb1ff840 00007ffd`5505b0ea DUser!GetGadgetFocus+0x33b3
00000062`fb1ff8d0 00007ffd`5505b1bc msvcrt!_callthreadstartex+0x1e
00000062`fb1ff900 00007ffd`570f7974 msvcrt!_threadstartex+0x7c
00000062`fb1ff930 00007ffd`5841a261 KERNEL32!BaseThreadInitThunk+0x14
00000062`fb1ff960 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Updated:
I found Someone thought the same way (in case of dead links, frequent):
Slim reader/writer locks don't remember who the owners are, so you'll have to find them some other way
Raymond | August 10th, 2011
The slim reader/writer lock is a very convenient synchronization facility, but one of the downsides is that it doesn’t > keep track of who the current owners are.
When your thread is stuck waiting to acquire a slim reader/writer lock, a natural thing to want to know is which threads own the resource your stuck thread > waiting for.
Since there’s not facility for going from the waiting thread to the owning threads, you’ll just have to find the owning threads some other way. Here’s the thread > that is waiting for the lock in shared mode:
ntdll!ZwWaitForKeyedEvent+0xc
ntdll!RtlAcquireSRWLockShared+0x126
dbquery!CSearchSpace::Validate+0x10b
dbquery!CSearchSpace::DecomposeSearchSpace+0x3c
dbquery!CQuery::AddConfigs+0xdc
dbquery!CQuery::ResolveProviders+0x89
dbquery!CResults::CreateProviders+0x85
dbquery!CResults::GetProviders+0x61
dbquery!CResults::CreateResults+0x11c
Okay, how do you find the thread that owns the lock?
First, slim reader/writer locks are usable only within a process, so the candidate threads are the one within the process.
Second, the usage pattern for locks is nearly always something like
enter lock
do something
exit lock
It is highly unusual for a function to take a lock and exit to
external code with the lock held. (It might exit to other code within the same component, transferring the obligation to exit the lock to that other code.)
Therefore, you want to look for threads that are still inside dbquery.dll
, possibly even still inside CSearchSpace
(if the lock is a per-object lock rather > than a global one).
Of course, the possibility might be that the code that entered the lock messed up and forgot to release it, but if that’s the case, no amount of searching for it > will find anything since the culprit is long gone.
Since debugging is an exercise in optimism, we may as well proceed on the assumption that > we’re not in the case. If it fails to find the lock owner, then we may have to revisit the assumption.
Finally, the last trick is knowing which threads to ignore.
For now, you can also ignore the threads that are waiting for the lock, since they are the victims not the cause. (Again, if we fail to find the lock owner, we > can revisit the assumption that they are not the cause; for example, they may be attempting to acquire the lock recursively.)
As it happens,
there is only one thread in the process that passes all the above filters.
dbquery!CProp::Marshall+0x3b
dbquery!CRequest::CRequest+0x24c
dbquery!CQuery::Execute+0x668
dbquery!CResults::FillParams+0x1c4
dbquery!CResults::AddProvider+0x4e
dbquery!CResults::AddConfigs+0x1c5
dbquery!CResults::CreateResults+0x145
This may not be the source of the problem, but it’s a good start.
(Actually, it looks very promising since the problem is probably
that the process on the other side of the marshaller is stuck.)