0

Server: Windows 2012r2 Debug Diagnostic Tool v2.1 update 1

Debugger is attached to application pool. I've confirmed it's the correct pool for the site. Pool crashes, however a dump file is never generated.

"Application pool '' is being automatically disabled due to a series of failures in the process(es) serving that application pool."

The rule is just simply set to look at the application pool, not capturing first chance exceptions. I've tried deleting and re-adding it a few times and it never generates the dump.

I checked the debug logs it generated and this is the last exception generated right before the pool crashes:

WARNING: Frame IP not in any known module. Following frames may be wrong. 0x0 0x0 0x0

Edit: Wanted to add that the dumps do generate for first chance exceptions. It seems to only be a problem when trying to capture the second chance, or the one that is actually causing the crash.

Edit 2: Last few lines from one of the debug logs per request:

[9/16/2015 7:21:31 PM]
  Exception 0XC00000FD on thread 154788.  DetailID = 48
  Thread created. New thread system id - System ID: 85156
  Thread exited. Exiting thread system id - System ID: 85156. Exit code - 0x00000000
  Thread exited. Exiting thread system id - System ID: 326816. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 41368. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 213340. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 300224. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 51008. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 45288. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 75176. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 143512. Exit code - 0x800703e9
  Thread exited. Exiting thread system id - System ID: 68504. Exit code - 0x800703e9
....... (goes on like this for awhile)
Process exited. Exit code - 0x800703e9

The exception correlates to this

DetailID = 48
    Count:    1
    Exception #:  0XC00000FD
    Stack:        

        WARNING: Frame IP not in any known module. Following frames may be wrong.
        0x0
        0x0
        0x0
        0x0
        0x0
Noah Sparks
  • 1,710
  • 13
  • 14
  • Perhaps you're suffering from [Known issue relating to Windows 7 kernel symbols](http://stackoverflow.com/questions/32278634/is-there-a-known-issue-relating-to-windows-7-kernel-symbols), which could also apply to Server 2012R2. Check if your ntdll symbols have type information. Depending on how DebugDiag works internally, it needs that type information. – Thomas Weller Sep 22 '15 at 13:40
  • Hmm, it sounds like this is more related to reading the dumps once they generate. In this case I can't get the dump to generate to begin with. – Noah Sparks Sep 22 '15 at 14:04
  • "Debugger is attached" - what does that mean? A debugger would usually be notified of an unhandled exception, so it would break. Then, the second chance exception is gone and DebugDiag cannot handle the exception any more. – Thomas Weller Sep 22 '15 at 14:09
  • 1
    BTW: maybe tag this as [tag:debugdiag] – Thomas Weller Sep 22 '15 at 14:10
  • 1
    Just that I've told the debug diag tool to look at the application pool that is crashing. Thanks added the tag. – Noah Sparks Sep 22 '15 at 14:18

1 Answers1

4

By default the debugdiag crash rule takes dumps for all the unhandled second chance exceptions only (if you create a crash rule and leave all settings to default) so if the dumps are not getting generated then the process is not crashing with a 2nd chance exception.

At times CLR calls TerminateProcess function when it encounters fatal exceptions (stack overflow being one of them). So if your process is crashing with these kind of exceptions then you won't get dumps using the default rule and you should change the rule to include the ntdll terminateprocess breakpoint which is present in the default breakpoint list. The bad effect of enabling this breakpoint is that now you get dumps even for SAFE EXITS (like worker process idle shutdowns , recycles etc.) so you need to check the time stamp of the event logged and match the dump file with that...

It would help if you paste the last 5-10 lines of what you see in the text files generated by debugdiag for w3wp.txt.

EDIT: Adding the Callstack as I see in the debugger...

0:065> kL 50
# ChildEBP RetAddr  
00 1a8f291c 74b80947 ntdll!NtTerminateProcess
01 1a8f292c 73e0843d KERNELBASE!TerminateProcess+0x23
02 1a8f29b8 73e07d03 clr!EEPolicy::HandleFatalStackOverflow+0x1ba
03 1a8f29e8 73dca49f clr!EEPolicy::HandleStackOverflow+0x1ac
04 1a8f2a0c 76f500b1 clr!COMPlusFrameHandler+0x9b
05 1a8f2a30 76f50083 ntdll!ExecuteHandler2+0x26
06 1a8f2afc 76f507ff ntdll!ExecuteHandler+0x24
07 1a8f2afc 17732c83 ntdll!KiUserExceptionDispatcher+0xf
08 1a8f309c 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x1b
09 1a8f3184 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
0a 1a8f326c 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
0b 1a8f3354 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
0c 1a8f343c 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
0d 1a8f3524 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
0e 1a8f360c 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
0f 1a8f36f4 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
10 1a8f37dc 17733104 App_Web_lotdetail_aspx_cdcab7d2_hoxucj_s!Unknown+0x49c
Puneet Gupta
  • 2,237
  • 13
  • 17
  • Thanks for the detailed reply. I added the break point and will see if it helps. Unfortunately this crash only occurs every few days so no way to quickly know. Edited the main topic with the last bit from one of the logs. Is that the info you are looking for? – Noah Sparks Sep 22 '15 at 15:09
  • Ok, I hope you will get the right dumps now. And I forgot that the logging format changed a bit with debugdiag 2.0 for our log files so no, that is not what I wanted to see. Basically the log files have exit codes, something like this "Process exited. Exit code - 0xfffffffe" and prior to 2.0 this used to be the last line in the logs.... What you should do is search upwards from the bottom of the log file for "Process exited" and see if you can find the exit code as that might help to understand a few things – Puneet Gupta Sep 23 '15 at 08:31
  • Ok thanks, I updated the main topic with the relevant information. That entry from the log is the one that correlates with the process crash in the System logs and is the last one. – Noah Sparks Sep 23 '15 at 12:23
  • BTW, the crash is due to Stack Overflow exception as I mentioned in my initial comment C:\err>err 0x800703e9 # for hex 0x800703e9 / decimal -2147023895 : COR_E_STACKOVERFLOW corerror.h # MessageText: – Puneet Gupta Sep 25 '15 at 11:07
  • Well the dump files generated this time, but it seems like there is nothing relevant in them. It says crashhanganalysis failed - Unable to cast object of type 'System.UInt64' to type 'System.String'. I tried reconfiguring it again this time using the stack overflow exception. – Noah Sparks Sep 30 '15 at 14:49
  • It seems that the analysis of debug diag failed to run...If you can share the dump file somewhere that I can download then i can take a look as to why it failed...But if you have to find the right dump that really crashed....Do you see thread stack printed in the report or not ? – Puneet Gupta Sep 30 '15 at 18:07
  • the dump file explains why analysis failed. You are collected *minidumps*. For CLR level debugging, you should always collect FULL USER DUMP. All I can determine from the dump is that stack overflow is happening inside some function inside lotdetail.aspx so you may want to check the code of this page to see if there are some functions which are called recursively and see if logic bug is there.....Added the callstack to the original answer – Puneet Gupta Oct 01 '15 at 05:06
  • Thanks Puneet...I thought it was set to full, but maybe not. I'll pass this along to the developer and see if he can find the cause inside of that page. – Noah Sparks Oct 01 '15 at 12:46