We have a ASP.NET Core 2.1 application in production which sometimes (from once a day to every two or three days) hangs and serves no more request. This results in 502 Errors from the IIS which is placed in front. Only option to get things working again seems to restart the application.
When examining a memory dump with WinDbg, we noticed that almost all of our threads (we pushed the min number of Threadpool threads) have a rather small stack trace and are stuck in WaitForSingleObject and not somewhere in our application code. The output was generated using the WinDbg command !mex.us
.
16352 threads [stats]: 27 29 37 38 39 40 41 42 43 44 ...
00007ff898d85b84 ntdll!NtWaitForSingleObject+0x14
00007ff895e93eef KERNELBASE!WaitForSingleObjectEx+0x8f
00007ff885a1826a clr!CLRSemaphore::Wait+0x8a
00007ff885a190cf clr!ThreadpoolMgr::UnfairSemaphore::Wait+0x115
00007ff885a1927f clr!ThreadpoolMgr::WorkerThreadStart+0x28b
00007ff885ac5abf clr!Thread::intermediateThreadProc+0x86
00007ff8982f84d4 kernel32!BaseThreadInitThunk+0x14
00007ff898d4e851 ntdll!RtlUserThreadStart+0x21
Are these threads actively waiting for other things to happen or are they done with their work and just in idle, waiting for new things to do?
If the first assumption is correct (which would explain the hanging application), why does the stack trace not point to the original call they are waiting for?