I have a crash dump from a customer of ours who's experiencing an issue we can't reproduce and neither can they but when they release their product to the end-user it typically crashes. Because of this it's been very difficult to decipher what's going on, we got a crash dump from them, let me say that I'm still learning my way around WinDbg and crash dump analysis for that matter. It's a .net app that interops our unmanaged dll in. I don't see our module listed on any of the call stacks of the threads at the time of the crash so at first glance it appears not to be our fault. Also the customer can't share the actual application, nor can they send even a reasonable sample that mimics what they're doing due to security restrictions.
But the end-user only recently started experiencing the issue after they upgraded to a more recent release, so although it doesn't prove we're at fault, it seems highly likely.
So I'm not expecting a magic answer to my problem, I'm more or less looking for techniques or an approach to root-causing such a crash using only the dump.
I suspect it's heap corruption, so the actual corruption occurs but doesn't bring down the process until much later. The call stack of the suspected thread doesn't give us much to go on, looks like something is being freed that shouldn't be and an Access Violation is reported.
One thing of note is that the exception context record (.ecxr command), seems to be trashed. So esp=0 and ebp=0, that makes me wonder if I can even gain anything of value from this crash dump because in my experience until this point I can usually get a valid call stack from .ecxr. But if I look at the call stack of the suspected thread I get a valid call stack. The debugger's heuristics (!analyze) don't give me much insight either other than some memory was freed that shouldn't have been.
One good idea was to have the customer enable Page Heap in GFlags.exe to catch the corruption as soon as it happens if it does but due to the customer's setup that probably won't happen. So I have to make the assumption this crash dump is all I'm ever going to get from them, and I have to solve the issue with that alone.
I find myself spinning my wheels on this and think maybe if I read some stories of some terribly difficult crash dump analyses can be shared with me that it can give me a new path to try. I can read some assembly but it would seem to me experts in this area have many techniques up their sleeves before resorting to this and I'm hoping maybe they can share some with me.