24

My program crashes with a segmentation fault when ran normally. So I run it with GDB, but it won't crash when I do that. Why might this occur?

I know that Valgrind's FAQ mentions this (not crashing under Valgrind), but I couldn't really find anything about this related to GDB in Google.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sterling
  • 3,835
  • 14
  • 48
  • 73

10 Answers10

16

I've had this happen to me before (you're not alone), but I can't remember what I did to fix things (I think it was a double free).

My suggestion would be to set up your environment to create core dumps, and then use GDB to investigate the core dump after the program crashes. In Bash, this is done with ulimit -c size, where size can be anything; I personally use 50000 for 25 MB max size; the unit is in 512-byte increments.

You can use GDB to investigate a core dump by using gdb program core.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
kestrel
  • 1,314
  • 10
  • 31
  • 1
    +1 usefulness. In my case, it turned out to be an uninitialized member pointer. Normally, the pointer would contain garbage, so my if(bob) delete bob; code would crash, but under GDB I lucked out and got 0 for the value so the program ran normally. – sirbrialliance May 28 '12 at 19:13
9

It sounds like a Heisenbug you have there :-)

If the platform you're working with is able to produce core files, it should be possible to use the core file and GDB to pinpoint the location where the program crashes. A short explanation can be found here.

Let it crash a couple of times though. When the crash is caused by stack smashing or variable overwriting, the bug may seem to "walk around".

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
fvu
  • 32,488
  • 6
  • 61
  • 79
4

Try attaching to the running process within gdb, continuing, and then reproducing the crash. In other words, don't start the program within gdb; instead, start the program normally and then attach <pid>.

Sometimes when stepping through lines individually, a race condition that causes the program to crash will not manifest, as the race hazard has been eliminated or made exceedingly improbable by the "lengthy" pauses between steps.

Daniel Trebbien
  • 38,421
  • 18
  • 121
  • 193
  • 1
    I'm not sure what you mean by "start the program normally and then attach ." Just add 'attach ' like its a command line arg? – Sterling Sep 21 '11 at 23:08
  • @Sterling: `attach ` is a `gdb` command. Suppose that your program is called `my_program`. From a shell (or command) prompt, run it with `./my_program`. Open another shell prompt, start `gdb`, determine the PID of the running instance of `my_program`, and at `gdb`'s prompt, type `attach `. `gdb` will attach to the process and pause it. You can then set breakpoints and/or `continue`. When `my_program` segfaults, `gdb` will let you see exactly where it crashed. – Daniel Trebbien Sep 22 '11 at 15:40
  • this trick worked for me to find the problem in the program. – Abhishek Sagar Jul 26 '20 at 19:14
3

Well I tracked it down to a pthread_detach call. I was doing pthread_detach(&thethread). I just took away the reference and changed it to pthread_detach(thethread) and it worked fine. I'm not positive, but maybe it was a double free by detaching the reference then destroying it again when it went out of scope?

Sterling
  • 3,835
  • 14
  • 48
  • 73
  • 1
    You should not pass `pthread_t*` to `pthread_detach` call. Also check for return value of `pthread_detach`. Particularly check for `ESRCH` error. [man pthread_detach](http://man7.org/linux/man-pages/online/pages/man3/pthread_detach.3.html). – ks1322 Sep 22 '11 at 11:52
  • Well my pthreads are just global variables on the stack, not pthread*. – Sterling Sep 28 '11 at 15:09
  • 1
    You should pass your pthreads to `pthread_detach` as you did in this answer. By passing **address** of `thethread` variable, I guess `pthread_detach` fails with `ESRCH` error and program crashes somewhere later. You can easily check the return value to confirm this. – ks1322 Sep 28 '11 at 15:32
1

If a bug depends on timing, GDB could prevent it from repeating.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
the.malkolm
  • 2,391
  • 16
  • 16
1

Check for the return value of the pthread_detach call. According to your answer, you are probably passing an invalid thread handle to pthread_detach.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ks1322
  • 33,961
  • 14
  • 109
  • 164
  • I check the return value and get no errors. (little late on the reply, just forgot about this thread) – Sterling Oct 05 '11 at 15:42
0

I also had this happen to me sometimes.

My solution: clean & rebuild everything.

I am not saying that this always solves all problems (and in the OP's case, the problem was something really wrong), but you can save yourself some trouble and time if you do this first when encountering such really weird "meta" bugs.

At least in my experience, such things more often than not come from old object files that should have been rebuilt, but were not. In both MinGW and regular GCC.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
TheSHEEEP
  • 2,961
  • 2
  • 31
  • 57
0

I just had a similar problem. In my case, it was connected to pointers in my linked list data structure. When I dynamically created a new list without initializing all the pointers inside the structure my program crashes outside GDB.

Here are my original data structures:

typedef struct linked_list {
    node *head;
    node *tail;
} list;

typedef struct list_node {
    char *string;
    struct list_node *next;
} node;

When I created a new "instance" of a list specifying its head and tail, the program crashed outside GDB:

list *createList(void) {
    list *newList = (list *) malloc(sizeof(list));
    if (newList == NULL) return;

    return newList;
}

Everything started to work normally after I changed my createList function to this:

list *createList(void) {
    list *newList = (list *) malloc(sizeof(list));
    if (newList == NULL) return;

    newList->head = (node *) 0;
    newList->tail = (node *) 0;

    return newList;
}

I hope it might help to someone in case of something similar to my example with non-initialized pointers.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jack
  • 11
  • 2
0

I faced a similar issue, where a thread was killed randomly, and a core dump was not created. When I attached GDB, the issue wouldn't reproduce.

To answer your question of why this is happening, I think this is timing issue. Since GDB will collect some data related to thread execution, it might slow down the process execution speed. If thread execution is slow issue, it isn't reproducing.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gangadhar
  • 45
  • 6
-3

When you run your code with GDB, it gets moved around. Now the illegal address you tried to reference before—the one that caused the segfault—is legal all of a sudden. It's a pain, for sure.

But the best way I know of to track down this kind of error is to start putting in printf()s all over the place, gradually narrowing it down.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Pete Wilson
  • 8,610
  • 6
  • 39
  • 51
  • No _code_ gets "moved around" but data does. If you don't believe me, try printing the result of malloc() in GDB and not in GDB. Inside GDB you consistently get 0x601010 for a 32-byte allocation, but "random" values when run normally. – kestrel Sep 21 '11 at 22:29
  • 11
    It's the other way around: GDB by default *disables* address randomizaton, so under GDB the data does *not* move around (which is usually what you want while debugging). This disabling of randomization can in fact make a bug disappear. You can enable randomization under GDB with `set disable-address-randomizaiton off` – Employed Russian Sep 22 '11 at 02:42
  • Thanks Employed Russian. The right command is `set disable-randomization off`. And it did allow me to replicate my crash in gdb. – Rémi Jan 08 '20 at 19:27