4

A legacy program most likely gets into an infinite loop on certain pathological inputs. I have >1000 such instances, however, I suspect that the vast majority of them trigger the same bug. Therefore, I would like to reduce the >1000 instances to the fundamentally different ones. The first step is to pause the application after, say, 10 seconds and collect the backtrace.

If I run:

gdb --batch --command=backtrace.txt --args ./legacy_program

with backtrace.txt

run
bt

and I hit Ctrl + C after 10 seconds in the same terminal I get exactly the backtrace I want.

Now, I would like to do that automatically. I have tried sending SIGINT (the expected equivalent of Ctrl + C) from another terminal but I do not get the backtrace anymore. Here are some of my failed attempts based on GDB how to stop execution without a breakpoint? Neither of these have any effect:

pkill -SIGINT gdb 
kill -SIGINT 5717 

where 5717 is the pid of the only gdb running. Sending SIGINT to the legacy_program the same way does kill it but then I do not get the backtrace:

Program received signal SIGINT, Interrupt.
Quit

How can I programmatically pause the execution of the legacy_program after 10 seconds and get a backtrace?


This post was motivated by my frustration not being able to find an answer to this question here at StackOverflow. Also note that [it is not merely OK to ask and answer your own question, it is explicitly encouraged.](https://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/)
Community
  • 1
  • 1
Ali
  • 56,466
  • 29
  • 168
  • 265
  • What OS and version of gdb are you using? Your example works fine with gdb 7.7.1 on Ubuntu 14.04.3 if I send SIGINT to the target process (I tried `/bin/sleep`); I get a stack trace, and then gdb exits. I'm not sure what sending SIGINT to gdb ought to do aside from perhaps aborting any partially-entered gdb command and going back to the `(gdb)` prompt. – Mark Plotnick Feb 24 '16 at 17:11
  • @MarkPlotnick I am using gdb 7.7.1 on Ubuntu 14.04.3 too. :) When I press Ctrl + C at the terminal (when I get the backtrace that I want), which program gets this signal? `gdb` or the `legacy_program`? But it is apparent from the answer that the `legacy_program` should get the signal. – Ali Feb 24 '16 at 17:38
  • gdb puts the target in a new pgrp, and puts the terminal in the same pgrp, so when you type ^C, it goes to the target (and to any offspring of the target). If you use `/bin/sleep` instead of `./legacy_program` and have `run 60` instead of `run` in backtrace.txt, do you see a backtrace? It works for me whether I type ^C or send a SIGINT to the target with a kill command. – Mark Plotnick Feb 24 '16 at 19:25
  • @MarkPlotnick With `/bin/sleep` I do get a backtrace. Interesting. The `legacy_program` does not mess with the signal handlers; it is a 100% standard, platform independent C++ application. – Ali Feb 24 '16 at 19:36
  • Cool, we've got a mystery, then. Could you run `strace`, rather than `gdb`, on your legacy program and see whether any handler is called when you type ^C ? – Mark Plotnick Feb 24 '16 at 19:55
  • @MarkPlotnick Here is the [source code](https://gist.github.com/baharev/d43fc0d6aa523355ab2d) of a C++ program that behaves the same way as the legacy app. As you can see, it does not mess with the signal handlers (nor does the legacy app). I am happy to run it with `strace` but please give me the precise command and flags you need. – Ali Feb 24 '16 at 20:25
  • I compiled and ran your dummy.cpp program on Ubuntu 14.04.3, both 64-bit and 32-bit, with the given compilation command and gdb command script, and I saw a stack trace when either sending a SIGINT via a kill command or by typing ^C. What version of gcc/g++ are you using? – Mark Plotnick Feb 24 '16 at 22:34
  • Please run `strace -f -o outputfile.txt ./legacy_program`, type ^C, and look at the last few lines of `outputfile.txt`. If they're anything other than `--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- +++ killed by SIGINT +++`, let us know. What I'm curious about is where the `Quit` string in your output came from. Maybe your program got a SIGQUIT rather than SIGINT, or it caught the SIGINT and printed "Quit". – Mark Plotnick Feb 24 '16 at 22:39
  • @MarkPlotnick It's late night here and past my bad time. Therefore, it's better for the both of us if I do not start debugging now. However, I was wrong, the legacy app does mess with the signal handlers unfortunately: It enables floating-point exceptions (`SIGFPE`). I will have a busy day tomorrow, and I won't be able to look into this issue until the evening. I will let you know. In any case, I appreciate your feedback and efforts. – Ali Feb 25 '16 at 00:11
  • @MarkPlotnick Oh, one more thing: I do not run the legacy app directly. It is executed as a subprocess of a Python script. Maybe that matters... :( – Ali Feb 25 '16 at 00:14
  • @MarkPlotnick I will try to put together an [SSCCE](http://sscce.org/). I suspect that the python interpreter in the middle complicates things... I will let you know. – Ali Feb 25 '16 at 00:39
  • @MarkPlotnick Today was the first day that I had some time to debug this issue and put together an SSCCE. Unfortunately, no matter what I do, I cannot put together a small example to reproduce the behavior of the legacy application. If I send it `SIGSTOP` all is fine, but it seems to swallow `SIGINT`s as nothing happens. I am out of both time and ideas at the moment. Apparently, I was wrong and it is not `gdb`'s fault. – Ali Feb 27 '16 at 19:34

1 Answers1

3

Apparently, it is a known (bug) feature in gdb, see GDB is not trapping SIGINT. Ctrl+C terminates program when should break gdb. Try sending SIGSTOP instead from the other terminal:

pkill -STOP legacy_program 

It works on my machine.

Note that you do not have to run the legacy_program in the debugger. Enable core dumps

ulimit -c unlimited 

and send the program SIGTRAP to make it crash, then get the backtrace from the core dump. So, start the program:

./legacy_program

From another terminal:

pkill -TRAP legacy_program 

The backtrace can be obtained like this:

gdb --batch -ex=bt ./legacy_program core
Ali
  • 56,466
  • 29
  • 168
  • 265