7

I am making a library that have too much code to give it here.

My problem is a segmentation fault, that Valgrind analyse as:

Jump to the invalid address stated on the next line
at 0x72612F656D6F682F: ???
at [...] (stack call)

Thanks to this question, I guess it is because I have a stack corruption somewhere.

My question is: how to find it?
I tried using GDB, but the segmentation fault appears to not be at the same place. GDB tell me it is on the first line of a function while Valgrind tell it is the call of this function that make a segmentation fault.

Community
  • 1
  • 1
Aracthor
  • 5,757
  • 6
  • 31
  • 59
  • 1
    Hunting down UB-related issues is no fun at all, and the heap/stack corruption sort can be really painful. I don't know a perfect methodical way to do this -- only a process of narrowing down suspects (ex: temporarily omit sections of code to process, eliminate suspects like an investigator). What I've found over the years is that I get fewer and fewer of these -- they're easier to prevent than to discover in hindsight. Asserting assumptions liberally, and especially around dangerous code, can be useful. Doing rigorous testing whenever you involve low-level constructs can be a life saver. –  Dec 02 '15 at 01:08
  • 1
    The rigor of your testing procedure should typically scale with the amount of low-level code you're writing -- like if you are writing a low-level container or memory allocator, that needs a lot of unit testing. Anyway -- afraid this isn't so helpful for the immediate problem -- maybe someone has a really great way to debug these. –  Dec 02 '15 at 01:09
  • 1
    @Ike Right, it was an experimental container that was overriding data on my stack... Thanks for the advice. – Aracthor Dec 02 '15 at 01:47

2 Answers2

4

If the problem is repeatable, you can use technique similar to this answer to set a watchpoint on the location of return address, and have GDB stop on the instruction immediately following the one that corrupts it.

Community
  • 1
  • 1
Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • @Aracthor And the point of the answer is to tell you precisely how to find that out. You *know* where the return address is stored, and you *know* it is correct on entry into the function. So set the watchpoint, and GDB will stop when that location is overwritten. – Employed Russian Dec 02 '15 at 01:28
  • But what is the *location of return address* that I should set as watchpoint? – Aracthor Dec 02 '15 at 01:31
  • If you have a non-optimized build, and if you are on `x86_64`, then when you `step` into the function (past function prolog), return address will be stored at `$rbp[1]`. If you do `x/2a $rbp`, you should see previous $rbp and the address of the caller of your function. If you have an optimized build, you'll have to `disassemble` the function to find where return address is stored. – Employed Russian Dec 02 '15 at 01:38
3

Since this is from years ago, you've probably figured out your bug. But for anyone who might stumble upon this, I would strongly encourage you to look into the "sanitizers".

If you're running Memcheck, you can probably run AddressSanitizer, which exists in both clang and gcc. AddressSanitizer can often detect stack corruption issues better than Memcheck. (Besides stack corruption, AddressSanitizer can detect many different types of addressing bugs).

However, if you scroll back in your Memcheck log, you might see Conditional jump or move depends on uninitialised value(s), in which case you're using an uninitialized variable, which is often harder to debug. For this, you can try MemorySanitizer (currently clang and Linux only, https://clang.llvm.org/docs/MemorySanitizer.html). In particular, look at the origin tracking options. This provides better origin tracking than Memcheck for uses of uninitialized variables. Do note, however, that MemorySanitizer is not trivial to set up, as it generally requires all external libraries to be built with (MemorySanitizer) instrumentation.

tdp2110
  • 281
  • 3
  • 12