2

I have a DX9 application that runs on an embedded Windows XP box. When leaving it automated overnight for soak testing it crashes after about six to eight hours. On our dev. machines (Win 7) we can't seem to reproduce this issue. I'm also fairly certain it's not a memory leak.

  • If we run the same application in Debug on the embedded machines, it doesn't crash.
  • If we place a __try/__except around the main loop update on the embedded machines, it doesn't crash.

I know in Debug, there is some additional byte padding around the local stack which may be "hiding" a local array access out of bounds, or some sort of uninitialized variable is sneaking through.

So I have two questions:

  1. Does __try/__except behave similar to debug, even when run in release?
  2. What kind of things should I be scanning the code for if we have a crash in Release mode, but not in Debug mode?
K-ballo
  • 80,396
  • 20
  • 159
  • 169
Game_Overture
  • 1,578
  • 3
  • 22
  • 34
  • 1
    *"What kind of things should I be scanning the code for if we have a crash in Release mode, but not in Debug mode?"* - that'd be `assert()` or other debug code with side-effects – Flexo May 21 '12 at 21:22
  • 1
    First, there is no try/except, there is try/catch. Second, there may be many reasons why you don't observe crash in debug, one of them you might need to run it a lot longer. – Gene Bushuyev May 21 '12 at 21:26
  • no, i'm using __try __except... they're similar but they have underlying differences. I'm not too sure about the details however. – Game_Overture May 21 '12 at 21:31
  • @GeneBushuyev: `__try`/`__except` are Windows-specific, possibly VC++-specific constructs that support SEH, an operating-system managed exception dispatching mechanism that has nothing to do with C++ exceptions. It's used mostly to dispatch hardware-generated errors (page faults, access violations, FP exceptions and the like), which usually shouldn't be trapped. – Matteo Italia May 21 '12 at 21:41
  • 2
    What kind of programmer is not sure about the details of the language features he's using? (a bad one) – shoosh May 21 '12 at 21:41
  • 3
    @shoosh: We're talking C++ here. Most programmers do not know every detail of every language feature. Good ones have learned how to use the features safely, which still doesn't require memorizing every corner case. – Ben Voigt May 21 '12 at 21:51

2 Answers2

3

If you're using __try{ } __except() you shouldn't.
Those and C++ code don't mix well. (for instance, you can't have C++ objects on the stack of a function wrapped with those. You should use C++ try {} catch() {} if you use catch(...) (with ellipsis) it does basically the same as __except()

both try.. catch and __try .. __except behave the same in debug and release.

If you suspect that your problem is an unexpected exception you should read about all of the following:

SetUnhandledExceptionFilter()
_set_se_translator()

_CrtSetReportMode()
_RTC_SetErrorFunc()
_set_abort_behavior()
_set_error_mode()
_set_new_handler()
_set_new_mode()
_set_purecall_handler()
set_terminate()
set_unexpected()
_set_invalid_parameter_handler()
_controlfp()

Using one of the first two would probably allow you to pinpoint your problem pretty quickly. The rest are there if you want absolute control for all error cases possible in your process.

Specifically, with SetUnhandledExceptionFilter() you can set up a function filter which logs the address of the code which caused the exception. You can then use your debugger to pin point that code. Using the DbgHelp library and with the information given to the filter function you can write some code which prints out a full stack trace of the crash, including symbols and line numbers.

Make sure you set up your build configuration to emit debug symbols for release builds as well. They can only help and don't do anything to slow your application (but maybe make it bigger)

shoosh
  • 76,898
  • 55
  • 205
  • 325
  • IIRC `catch(...)` to catch SEH exceptions was disabled by default in some recent version of VC++ (see e.g. comments [here](http://stackoverflow.com/questions/1373686/unable-to-catch-c-exception-using-catch)). – Matteo Italia May 21 '12 at 21:42
  • @Matteo: It depends on `/EHs` vs `/EHa`. But using `_set_se_translator()` should help as well. – Ben Voigt May 21 '12 at 21:45
0

If we place a __try/__except around the main loop update on the embedded machines, it doesn't crash.

Then do that.

A single __try block around the whole program (as well as the entry point for each worker thread) is the recommended approach, it lets you write out a crash dump and make an error report before exiting. There's not much recovery you can do with SEH, because the exceptions just don't carry enough information to distinguish different failures usefully. Storing the whole program state and pulling it into a debugger is very useful, though.

Note: Some video drivers cause SEH exceptions that they also catch, perhaps some logic expects there to be more than one SEH scope installed, which your __try block provided.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • You should be careful not to block the SEH exceptions used by guard pages and the like (see e.g. [here](http://blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx)). I think that the `SetUnhandledExceptionFilter` solution would not present these inconveniences (but I'm not sure, I prefer not to play with SEH because there are lots of details that I may forget). – Matteo Italia May 21 '12 at 21:47
  • @Matteo: Those should be trapped by the OS exception handling code before it starts checking user exception filters. Only in case of an actual stack overflow (OS tried to enlarge stack, but failed) do you need to worry about guard pages causing access violations. – Ben Voigt May 21 '12 at 21:49
  • Are you sure? As far as I can tell from Raymond Chen's article it's expected that guard pages exceptions traverse the whole stack, and the OS code catches them at the top of the stack, just before declaring them uncaught. So, an unhandled exception filter should be fine, while a top-level (but still not real-top of the stack) `__try` could give problems. – Matteo Italia May 21 '12 at 21:52
  • @Matteo: Read the comments... the OS intercepts stack growth cases before any user-mode exception filters are checked. The difficulty is if you trigger the guard page of some other thread. If that happens, though, emitting a crash dump and exiting is the best approach, since it gives you the opportunity to bring the state back to a developer computer and see what happened (may or may not reveal cause). Note that I didn't recommend swallowing exceptions. – Ben Voigt May 21 '12 at 22:00