0

I know I'm reaching for straws here, but this one is a mystery... any pointers or help would be most welcome, so I'm appealing to those more intelligent than I:

We have a crash exhibited in our release binaries only. The crash takes place as the binary is bringing itself down and terminating sub-libraries upon which it depends. Its ability to be reproduced is dependent on the machine- some are 100% reliable in reproducing the crash, some don't exhibit the issue at all, and some are in between. The crash is deep within one of the sublibraries, and there is a good likelihood the stack is corrupt by the time the rubble can be brought into a debugger (MSVC 2008 SP1) to be examined. Running the binary under the debugger prevents the bug from happening, as does remote debugging, as does (of all things) connecting to the machine via VNC. We have tried to install the Microsoft Driver Development Kit, and doing so also squelches the bug.

What would be the next best place to look? What tools would be best in this circumstance? Does it sound like a race condition, or something else?

Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
fbrereto
  • 35,429
  • 19
  • 126
  • 178
  • Sounds a lot like a threading bug of some kind. Is there a pattern to the machines that work vs not work? E.g. single-core/dual-core etc. The VNC thing is a mystery - is there any graphics work done during shutdown? – mdma May 28 '10 at 20:36
  • Does your app (or its libraries) use any windows? If so, you may be unloading a dll is needed to process messages (e.g., dll contains the wndproc but the dll is unloaded before the window is destroyed). – jdigital May 28 '10 at 20:48
  • @mdma: No pattern we've been able to detect thus far. – fbrereto May 28 '10 at 20:52
  • VNC might just be a timing thing, it's polling/capturing/compression/transmission of the screen bitmaps slowing down the machine to avoid the race condition. Just a thought. – Aardvark May 28 '10 at 20:52
  • @jdigital: We do have some GUI but its destruction as not begun at the point the crash is seen. – fbrereto May 28 '10 at 20:53
  • So you are shutting down libraries before you destroy the GUI? This could be the problem. – jdigital May 28 '10 at 21:06

4 Answers4

1

Have you tried Rational Purify? I've used this (some 4-5 years ago). Then it was helpful in tracking down memory bugs, stack corruption, invalid handles etc.

mdma
  • 56,943
  • 12
  • 94
  • 128
1

Try AppVerifier and GFlags together to find Page Heap corruption.

You'll likely need WinDbg as your debugger instead of Visual Studio to debug.

I also recommend this book on advanced Windows debugging for tracking down crashes such as the one you are hitting.

selbie
  • 100,020
  • 15
  • 103
  • 173
1

Are you using the threadpool by any chance and not cancelling or waiting for outstanding job objects to complete?

Alienfluid
  • 326
  • 1
  • 3
  • 11
0

The problem was a conflicting setting of the pernicious _SECURE_SCL flag under Visual Studio, causing silent ABI incompatibilities between the DLL and one of its dependencies.

fbrereto
  • 35,429
  • 19
  • 126
  • 178