0

I ask this question, because we're really stuck at finding the cause of a software crash. I know that questions like "Why does the software crash" are not appreciated, but we really don't know how to find the problem.

We currently do a longterm test of our software. To find potential memory leaks, we used the windows tool Performance monitor to track several memory metrics, such as Private bytes, Working set and Virtual bytes.

The software ran quite a long time (about 30 hours) without any problems. It does the same all the time, reading in an image from the harddrive, doing some inspection and showing some results.

Then suddenly it crashes. Inspecting the memory metrics in the performance monitor, we saw that strange steep rising of the working set bytes graph at 10.17AM. We encountered this several times and according to the dumpfiles, the exception code is always 0xc0000005 : "the thread tried to read from or write to a virtual address for which it does not have the appropriate access", but it appears at different positions, where no pointers are used.

Does someone know, what could be the cause of such a steep rise of the working set and why this could cause a software crash? How could we find out, if our software has a bug, when every time, the crash occurs the position of the crash is at another position?

The application is written in C++ and it runs on a windows 7 32bit pc.

Crash happens after the rise of the working set memory

AquilaRapax
  • 1,086
  • 8
  • 22
  • You probably know already that it's almost impossible to tell you what might be wrong with your software. Are you using multithreading? Because that would open the window for a lot of problems, especially of that kind where you get unexpected behaviour after a couple of minutes of runtime or maybe even only after many hours. Problems like using mutexes in the wrong way or forgetting to lock one. – Potaito Jun 03 '15 at 10:02
  • I could quite certainly say that yes, your software does have a bug. And the crash is probably due to allocating more memory than you can on a 32bit system. – Sami Kuhmonen Jun 03 '15 at 10:03
  • @potAito: You're right, that could be a problem, since it is multithreaded. I know that it could run hours and hours without crashing, but i thought, that running it more than a day could be evidence enough to say, that it's no threading problem. :( – AquilaRapax Jun 03 '15 at 10:19
  • @SamiKuhmonen : Yes i guessed that too, but the exception does not really fit to that, does it? Shouldn't it be a bad alloc exception in this case? – AquilaRapax Jun 03 '15 at 10:19
  • 1
    @AquilaRapax Not necessarily. For example `malloc()` will return `NULL` if it can't allocate memory. Then if you dereference it without checking, you may very well get this error. If all allocations are with C++ and exceptions are on, then there should be another exception, though. – Sami Kuhmonen Jun 03 '15 at 10:20

2 Answers2

1

It's actually impossible to know from the information that you have provided, but I would suggest that you have some memory corruption (hence the access violation). It could be a buffer-overflow issue... for example there is a missing null character from a string and so something is being appended indefinitely?

Recommended next step is to download the Debugging Tools for Windows suite. Setup WinDbg with your correct symbol files, and analyse the stack trace, to find the general area of the crash. Depending on the cause of the memory corruption this will be more or less useful. You could have corrupted the memory a long time before your crash occurs.

Ideally also run a static analysis tool on the code.

Dennis
  • 3,683
  • 1
  • 21
  • 43
  • We're already using the visual studio debugger to analyze the dumpfile and tried WinDgb as well, but unfortunately the occurence of the crash is always at another position. But could you please give me a name of a "static analysis tool"? – AquilaRapax Jun 03 '15 at 10:34
  • Parasoft make one for C++, but you can try Sonarqube with the C++ plugins as well. I haven't used it myself for C++ but I have for Java. As for the initial corruption you should look for occurrences of dangerous behaviours, such as c-string (`char*`) concats, copies, truncations etc... and review your uses of the `new` and `delete` keywords. – Dennis Jun 03 '15 at 16:12
1

Given information you have now, there is little chance to get an answer. You need more information, more specifically:

  1. Get more intelligence (is there anything specific about that files which cause crash? What about last-but-one file?)

  2. Insert more tracing and logging (as much as you can without making it 2x slower). At least you'll see where it crashes, and then will be able to insert more tracing/logging around that place

  3. As you're on Windows - consider handling c0000005 via _set_se_translator, converting it into C++ exception, and even more logging on the way this exception is unwinded.

There is no silver bullet for this kind of problems, only gathering more information and figuring it out.

P.S. As an unlikely shot - I've seen similar things to be caused by a bug in MS heap; if you're not using LFH yet (not sure, it might be default now) - there is an 1% chance changing your default heap to LFH will help.

No-Bugs Hare
  • 1,557
  • 14
  • 15
  • Thanks for your answer, i know that its hard to give an answer to such a problem, since it could be everything. We just have one image file and one configuration to test with, to eliminate diversity. We should certainly add more logging. I think we will certainly try out _set_se_translator and have a look for LFH. Thanks for that. – AquilaRapax Jun 03 '15 at 10:42