1

I want to set up a signal handler for SIGSEGV, SIGILL and possibly a few other signals that, rather than terminating the whole process, just terminates the offending thread and perhaps sets a flag somewhere so that a monitoring thread can complain and start another thread. I'm not sure there is a safe way to do this. Pthreads seems to provide functions for exiting the current thread, as well as canceling another thread, but these potentially call a bunch of at-exit handlers. Even if they don't, it seems as though there are many situations in which they are not async-signal-safe, although it is possible that those situations are avoidable. Is there a lower-level function I can call that just destroys the thread? Assuming I modify my own data structures in an async-signal-safe way, and acquire no mutexes, are there pthread/other global data structures that could be left in an inconsistent state simply by a thread terminating at a SIGSEGV? malloc comes to mind, but malloc itself shouldn't SIGSEGV/SIGILL unless the libc is buggy. I realize that POSIX is very conservative here, and makes no guarantees. As long as there's a way to do this in practice I'm happy. Forking is not an option, btw.

user2040142
  • 98
  • 1
  • 7
  • 2
    Use a separate process. All threads in a process can access the same memoryspace, so they all can mess up internal data structure (e.g. of malloc & co) before they hit a point where they cause a segfault. Using a process instead of a thread will give you proper separation. – Ulrich Eckhardt Sep 20 '14 at 16:02
  • `malloc` absolutely will `SIGSEGV` without bugs in `malloc`, if you've clobbered its data structures. This is your program's fault. – R.. GitHub STOP HELPING ICE Sep 20 '14 at 16:49

1 Answers1

2

If the SIGSEGV/SIGILL/etc. happens in your own code, the signal handler will not run in an async-signal context (it's fundamentally a synchronous signal, but would still be an AS context if it happened inside a standard library function), so you can legally call pthread_exit from the signal handler. However, there are still issues that make this practice dubious:

  • SIGSEGV/SIGILL/etc. never occur in a program whose behavior is defined unless you generate them via raise, kill, pthread_kill, sigqueue, etc. (and in some of these special cases, they would be asynchronous signals). Otherwise, they're indicative of a program having undefined behavior. If the program has invoked undefined behavior, all bets are off. UB is not isolated to a particular thread or a particular sequence in time. If the program has UB, its entire output/behavior is meaningless.

  • If the program's state is corrupted (e.g. due to access-after-free, use of invalid pointers, buffer overflows, ...) it's very possible that the first faulting access will happen inside part of the standard library (e.g. inside malloc) rather than in your code. In this case, the signal handler runs in an AS-safe context and cannot call pthread_exit. Of course the program already has UB anyway (see the above point), but even if you wanted to pretend that's not an issue, you'd still be in trouble.

If your program is experiencing these kinds of crashes, you need to find the cause and fix it, not try to patch around it with signal handlers. Valgrind is your friend. If that's not possible, your best bet is to isolate the crashing code into separate processes where you can reason about what happens if they crash asynchronously, rather than having the crashing code in the same process (where any further reasoning about the code's behavior is invalid once you know it crashes).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Perfectly answers my question. This is actually for a system that compiles things, dlopen's them, and then runs the code---livecoding in C. So mistakes are inevitable and the idea is just to do the least bad thing. It seems like, because the program may be doing basically whatever to random memory locations, there's no way to guarantee that a faulty thread won't bring down the rest. But at least the more likely outcome here would be that the threads that are good stay good, and the code can be fixed and recompiled without stopping the program. – user2040142 Sep 20 '14 at 18:13
  • For that kind of usage case, you really should be running the code in a separate process. The only way you're likely to have any success with that kind of approach is when the code you're running is accessing little or no shared data from the main program (and even then, buffer overflows could trash data from the main program), but such usage cases are exactly the ones that are easy to move into a separate process anyway. The cases that are hard to move to a separate process are ones with lots of sharing, and those are the ones where a buggy module is going to trash the whole program state. – R.. GitHub STOP HELPING ICE Sep 20 '14 at 21:43