5

I am developeing an application on linux where i wanted to have backtrace of all running threads at a particular frequency. so my user defined signal handler SIGUSR1 (for all threads) calls backtrace().

i am getting crash(SIGSEGV) in my signal handler which is originated from backtrace() call. i have passed correct arguments to the function as specified on most of the sites. http://linux.die.net/man/3/backtrace.

what could make backtrace() crash in this case?

To add more details:

What makes me to conclude that crash is inside backtrace is frame 14 below. onMySignal is the signal handler SIGUSR1 and it calls backtrace.

Sample code of onMySignal is (copied from linux documentation of backtrace)

pthread_mutex_lock( &sig_mutex );

int j, nptrs;
    #define SIZE 100
        void *buffer[100] = {NULL};//or void *buffer[100];
        char **strings;
       nptrs = backtrace(buffer, SIZE);
           pthread_mutex_unlock( &sig_mutex );

(gdb) where
#0  0x00000037bac0e9dd in raise () from 
#1  0x00002aaabda936b2 in skgesigOSCrash () from 
#2  0x00002aaabdd31705 in kpeDbgSignalHandler () 
#3  0x00002aaabda938c2 in skgesig_sigactionHandler () 
#4  <signal handler called>
#5  0x00000037ba030265 in raise () from 
#6  0x00000037ba031d10 in abort () from 
#7  0x00002b6cef82efd7 in os::abort(bool) () from 
#8  0x00002b6cef98205d in VMError::report_and_die() ()
#9  0x00002b6cef835655 in JVM_handle_linux_signal () 
#10 0x00002b6cef831bae in signalHandler(int, siginfo*, void*) ()
#11 <signal handler called>
#12 0x00000037be407638 in ?? () 
#13 0x00000037be4088bb in _Unwind_Backtrace () 
#14 0x00000037ba0e5fa8 in backtrace () 
#15 0x00002aaaaae3875f in onMySignal (signum=10,info=0x4088ec80, context=0x4088eb50)   
#16 <signal handler called>
#17 0x00002aaab4aa8acb in mxSession::setPartition(int)
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000000 in ?? ()
(gdb)

hope this will make more clear of issue..

@janneb I have Written the Signal handler Implementation in Mutex lock for better synchronozation.

@janneb i did not find in the Document specifying API backtrace_symbols/backtrace is async_signal_safe or not. and whether they should be used in Signal handler or not.

Still i removed backtrace_symbols from my Signal handler and dont use it anywhere.. but my actual problem of crash in backtrace() persit. and no clue why it is crashing..

Edit 23/06/11: more details:

(gdb) where
#0  0x00000037bac0e9dd in raise () from 
#1  0x00002aaab98a36b2 in skgesigOSCrash () from 
#2  0x00002aaab9b41705 in kpeDbgSignalHandler () from 
#3  0x00002aaab98a38c2 in skgesig_sigactionHandler () from 
#4  <signal handler called>
#5  0x00000037ba030265 in raise () from 
#6  0x00000037ba031d10 in abort () from 
#7  0x00002ac003803fd7 in os::abort(bool) () from
#8  0x00002ac00395705d in VMError::report_and_die() () from 
#9  0x00002ac00380a655 in JVM_handle_linux_signal () from 
#10 0x00002ac003806bae in signalHandler(int, siginfo*, void*) () from 
#11 <signal handler called>
#12 0x00000037be407638 in ?? () from libgcc_s.so.1
#13 0x00000037be4088bb in _Unwind_Backtrace () from libgcc_s.so.1
#14 0x00000037ba0e5fa8 in backtrace () from libc.so.6
#15 0x00002aaaaae3875f in onMyBacktrace (signum=10, info=0x415d0eb0, context=0x415d0d80)
#16 <signal handler called>
#17 0x00000037ba071fa8 in _int_free () from libc.so.6
#18 0x00000000000007e0 in ?? ()
#19 0x000000005aab01a0 in ?? ()
#20 0x000000000000006f in ?? ()
#21 0x00000037ba075292 in realloc () from libc.so.6
#22 0x00002aaab6248c4e in Memory::reallocMemory(void*, unsigned long, char const*, int) ()

crashed occured when realloc was executing and one of the address was like 0x00000000000007e0 (looks invalid)..

Hasturkun
  • 35,395
  • 6
  • 71
  • 104
sandeep
  • 513
  • 9
  • 17
  • 2
    Could you add some code? Are you sure that `backtrace` is the exact spot where it is crashing? However, with `backtrace`, you can easily supply an invalid pointer which would cause this. – Mr. Shickadance Jun 16 '11 at 11:29
  • As an aside, your code is not async-signal-safe as you're calling backtrace_symbols() in your signal handler. – janneb Jun 17 '11 at 08:03
  • 1
    Adding a mutex and locking/unlocking it in the signal handler is not a solution. The solution is to only use async-signal-safe calls in the signal handler. – janneb Jun 20 '11 at 06:58
  • Have you removed the mutex calls? also, did you try using the alternate signal stack? – Hasturkun Jun 23 '11 at 11:02

2 Answers2

2

The documentation for signal handling defines the list of safe functions to call from a signal handler, you must not use any other functions, including backtrace. (search for async-signal-safe in that document)

What you can do is write to a pipe you have previously setup, and have a thread waiting for that pipe, which then does the backtrace.

EDIT:

Ok, so that backtrace function returns the current thread's stack, so can't be used from another thread, so my idea of using a separate thread to do the backtrace won't work.

Therefore: you could try backtrace_symbols_fd from your signal handler.

As an alternative you could use gdb to get the backtrace, without having to have code in your program - and gdb can handle multiple threads easily.

Shell script to run gdb and get back traces:

#!/bin/bash
PID="$1"
[ -d "/proc/$PID" ] || PID=$(pgrep $1)
[ -d "/proc/$PID" ] || { echo "Can't find process: $PID" >&2 ; exit 1 ; }

[ -d "$TMPDIR" ] || TMPDIR=/tmp

BATCH=$(mktemp $TMPDIR/pstack.gdb.XXXXXXXXXXXXX)
echo "thread apply all bt" >"$BATCH"
echo "quit" >>"$BATCH"
gdb "/proc/$PID/exe" "$PID" -batch -x "$BATCH" </dev/null
rm "$BATCH"
Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
  • Which thread do you mean should wait for pipe? on backtrace in that should give callstack of the original thread for whom signal was raised and signal handler was invoked? Do you mean we should start a thread from signal handler and it should do backtrace? Who/where should it start? what data should have been written to the pipe? the scenario flow is quite unclear.. could you please elaborate. – sandeep Jun 22 '11 at 12:10
  • i need only IP addresses of stack frames which can be obtained from backtrace(1st parameter buffer), and as i dont want to read symbol(Function) name, need not to use backtrace_symbols_fd/backtrace_symbols either. and i have removed backtrace_symbols call from my implementation. as my crash origin is backtrace(), i need to have solution for this. again as the IP buffer read by backtrace() is used for further processing in my code, gdb/pstack in separate shell are not the correct options for me. – sandeep Jun 23 '11 at 05:47
  • i also tried my own backtrace navigation algorithm but as people say stack end condition is difficult to identify it was failing some times. a bit relevant thread [link](http://gcc.gnu.org/ml/gcc/2007-06/msg00329.html) [link](http://stackoverflow.com/questions/582673/is-there-a-cheaper-way-to-find-the-depth-of-the-call-stack-than-using-backtrace) – sandeep Jun 23 '11 at 06:01
2

As stated by Douglas Leeder, backtrace isn't on the list of signal safe calls, though in this case I suspect the problem is the malloc done by backtrace_symbols, try using backtrace_symbols_fd, which does not call malloc, only write. (and drop the mutex calls, signal handlers should not sleep)

EDIT

From what I can tell from the source for backtrace, it should be signal safe itself, though it is possible that you are overrunning your stack.

You may want to look at glibc's implementation for libsegfault to see how it handles this case

Hasturkun
  • 35,395
  • 6
  • 71
  • 104
  • do you know any function which will be safe to use in backtrace? in fact my implementation works in most of the cases. and fails at very few scenarios where server logins are necessary. i also seen people suggestions saying use backtrace() inside SIGSEGV signal handler to identify crash location.. though my requirement is not to have backtrace on crash but at execution time and periodically. – sandeep Jun 22 '11 at 12:13
  • @sandeep: is it possible that your server login code is using the stack extensively? if so, consider using an alternate signal stack via `sigaltstack`, it may fix the problem – Hasturkun Jun 22 '11 at 15:45
  • i don't think sigaltstack is the appropriate method for me. my server login stack has only around 7-8 frames. infact now i have created simple test program which does malloc and free the memory repeatedly from function which is called in a loop. when this scenario is there, Similar crash occurs as before when malloc or free is on the stack and signal handler is doing backtrace. – sandeep Jun 29 '11 at 12:39
  • No, `backtrace()` is not in the [white list](http://man7.org/linux/man-pages/man7/signal-safety.7.html) of signal safe functions. – Leedehai Oct 11 '18 at 19:57
  • 2
    @Leedehai: `backtrace()` isn't specified by POSIX, so it shouldn't be surprising that it isn't on the POSIX list of async signal safe functions. Not being on that list doesn't mean a functions isn't async signal safe, but only that it isn't required to be. In any case, it should be safe to use as in libsegfault. – Hasturkun Oct 11 '18 at 20:21
  • @Hasturkun good point. But GNU C's [docs](https://www.gnu.org/software/libc/manual/html_node/Backtraces.html) on glibc says `backtrace()` is "AS-Unsafe" either (async-signal unsafe)... – Leedehai Oct 11 '18 at 20:25
  • 1
    @Leedehai: By the [backtrace manpage](http://man7.org/linux/man-pages/man3/backtrace.3.html), one possible issue is that calling `backtrace()` before libgcc is loaded may call `malloc()`, which can be fatal in a signal handler. Otherwise, AFAICT, the function itself is reentrant and uses no global state, so shouldn't cause problems. (Also, AFAICT, the reason it is documented as AS-Unsafe is the aforementioned loading of libgcc) – Hasturkun Oct 11 '18 at 20:33
  • 1
    @Hasturkun thanks! That manpage seems to suggest me to call `backtrace()` at program start (outside of signal handler) to "warm up" the data structure. Follow-up: hah, that's what Google's [Chromium](https://cs.chromium.org/chromium/src/base/debug/stack_trace_posix.cc) does. – Leedehai Oct 11 '18 at 20:41
  • @Leedehai: You could also just `dlopen()` libgcc. Though calling `backtrace()` at start will do that for you as well (so long as size >=1). edit: Given what Chromium does, calling `backtrace()` at start sounds safest. – Hasturkun Oct 11 '18 at 20:45