38

My application segfaults sometimes and mainly in malloc() and malloc_consolidate() when I look at the backtrace in gdb.

I verified that the machine has enough memory available, it didn't even start swapping. I checked ulimits for data segement and max memory size and both are set to 'unlimited'. I also ran the application under valgrind and didn't find any memory errors.

Now I'm out of ideas what else might be causing these segfaults. Any Ideas ?

Update: Since I'm not finding anything with valgrind (or ptrcheck), could it be that another application is trashing libc's memory structure or is there a separate structure for each process ?

Gene Vincent
  • 5,237
  • 9
  • 50
  • 86
  • 2
    Have you had it crash under valgrind? – Douglas Leeder Jun 23 '10 at 09:03
  • No, it didn't crash. Its a realtime application and under valgrind I can only put a very light load on it and it usually only crashes under a heavier load. – Gene Vincent Jun 23 '10 at 10:18
  • Which operating system is this? Judging by the toolchain, it sounds as if it may be Linux. In this case, no, other applications cannot trash your heap; it's something in your application. If this only happens under load, that makes it all the more tricky of course... What is different under load? How could this be causing you to trash the heap? Try "torturing" your application as best you can while it's running under Valgrind... how can you best reproduce the conditions that would exist under load? Maybe allocate memory gratuitously, something like that? – Martin B Jun 23 '10 at 15:41

2 Answers2

33

From http://www.gnu.org/s/libc/manual/html_node/Heap-Consistency-Checking.html#Heap-Consistency-Checking:

Another possibility to check for and guard against bugs in the use of malloc, realloc and free is to set the environment variable MALLOC_CHECK_. When MALLOC_CHECK_ is set, a special (less efficient) implementation is used which is designed to be tolerant against simple errors, such as double calls of free with the same argument, or overruns of a single byte (off-by-one bugs). Not all such errors can be protected against, however, and memory leaks can result. If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort is called immediately. This can be useful because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track down.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
BillTorpey
  • 891
  • 11
  • 8
  • Interestingly my code was crashing on `malloc_consolidate` somewhere deep within the google test library... setting MALLOC_CHECK_ to any value 0, 1, or 2 seems to prevent the crash but no matter what setting I use it doesn't print any additional diagnostic information so I still have no clue what was causing the error. – tjwrona1992 Jun 14 '22 at 17:10
  • Either way I can run my test now so take my upvote lol – tjwrona1992 Jun 14 '22 at 17:10
17

Most likely, you're trashing the heap -- i.e., you're writing beyond the limits of a piece of memory you allocated, and this is overwriting the data structures that malloc() uses to manage the heap. This causes malloc() to access an invalid address, and your application crashes.

Running out of memory would not cause malloc() to crash -- it would simply return NULL. That might cause your code to crash if you're not checking for NULL, but the crash site would not be in malloc().

It's slightly strange that Valgrind is not reporting any errors -- but there are some errors that the default "Memcheck" tool can miss. Try running Valgrid with the "Ptrcheck" tool instead.

Martin B
  • 23,670
  • 6
  • 53
  • 72
  • But shouldn't this have shown up under valgrind ? (Assuming my test coverage was good enough.) – Gene Vincent Jun 23 '10 at 09:02
  • 1
    Your comment seems to have overlapped with my edit -- as suggested there, try running Valgrind with the "Ptrcheck" tool. If malloc() crashes, it's almost certain you're trashing the heap in some way. – Martin B Jun 23 '10 at 09:04
  • 1
    As of Valgrind Release 3.7.0 (5 November 2011), the **exp-ptrcheck** tool has been renamed and scaled down in functionality to _check for overruns for stack and global arrays_ . It is now called **exp-sgcheck** ("Stack and Global Array Checking"). [link](http://valgrind.org/docs/manual/dist.news.html) – Amar May 02 '13 at 17:34