In C++, is it possible to detect the unexpected termination of a thread?

Question

I have a background worker thread which is constantly processing data.
(created using std::thread)

If the thread runs out of data to process, the thread is designed to wait - and then resume when mote data becomes available for processing.
(using std::condition_variable)

If this thread ever terminates unexpectedly, I want to ensure the thread is restarted in a fresh state.

But I don't know how to detect thread termination, or react to it.
What's the best way to approach this scenario?

You could have an [at exit handler](https://stackoverflow.com/questions/20112221/invoking-a-function-automatically-on-stdthread-exit-in-c11). — tadman, Feb 16 '18 at 18:54
Are there ways to detect it? Yes. You can`std::join` the thread which will block until it returns (and then you can re-create it). You can keep an eye on what thread IDs exist for your current process and if the thread id of your worker goes away you can re-spawn it. There are other options as well. Basically the answer to your question is "yes". — Jesper Juhl, Feb 16 '18 at 19:02
Why would it be unexpected? Fix whatever it is that made it terminate. — Martin James, Feb 16 '18 at 19:07
@MartinJames This is a preventative measure, not a bugfix. I'm trying to design this to handle unexpected scenarios without crumbling, rather than assuming I can predict everything that could possibly happen. — Giffyguy, Feb 16 '18 at 19:28
@JesperJuhl Thanks. Joining the threads works, of course - but it also defeats the purpose of multi-threading. I want this thread to continue processing in the background, and I don't want any other threads to be waiting for it if they don't have to. Watching thread IDs is an interesting idea, I'll have to look into that. Utilizing `thread_local` storage might be the smoothest solution, mentioned by tadman — Giffyguy, Feb 16 '18 at 19:31
@Giffyguy if any thread can just terminate unepectedly, your entire system is irretrievably borked and you should restart it:( A thread terminating without cause is a critical OS disaster that is irrecoverable without a reboot. You cannot join() or poll any thread to improve quality/reliability - what if it is the monitoring thread that terminates unexpectedly? — Martin James, Feb 16 '18 at 20:35
I mean, you writing C++. If there is an exception, catch it before it escapes. If the exception is so serious as to be uncatchable, then your app is into UB anyway. — Martin James, Feb 16 '18 at 20:41
@MartinJames It's irresponsible to assume any code can be guaranteed bug-free. Therefore it's not smart to assume I can permanently "fix" any code to run perfectly forever. Throwing my hands in the air and accepting "disaster," is equally ridiculous - my code should be able to recover from such problems. In the event of an unhandlable exception, the thread will exit prematurely (by design), and my exit handler needs to detect this immediately, and restart the failed background thread with a fresh, clean state. This is the proper method of resolving the _"irrecoverable, critical OS disaster."_ — Giffyguy, Feb 16 '18 at 20:58
@Giffyguy 'my exit handler needs to detect this immediately, and restart the failed background thread with a fresh, clean state' umm.. what context do you expect the exit handler to run in? What if that is the thread that has suffered the 'unhandleable exception'? You are just adding avoidable complication to fix a problem that you're not sure exists and, if it does, you should fix the root cause, not pile on band-aids that cannot be sure to cover up the damage already done to other data. If a thread can raise an unhandled exception that can escape, you need to fix it:( — Martin James, Feb 17 '18 at 14:35
@MartinJames The exit handler obviously needs to execute in the context of the main thread, not the potentially crashed background thread itself. I haven't detailed situations that might cause this background thread to crash, and I'm not going to now. You're fighting a ridiculous battle here. I'm aware that bugs need to be fixed. I'm also aware that robust code should be able to handle unexpected errors without critical failure. Neither of these are mutually exclusive. This is not a band-aid to avoid bugfixes - it's a stabilizing measure to keep the software running in extreme situations. — Giffyguy, Feb 17 '18 at 17:44

score 1 · Accepted Answer · answered Feb 16 '18 at 21:26

1

Threads can only exit in a controlled way: either the thread itself decides to exit, or it gets cancelled from another thread. The only other possibility is a crash in your program or some external killer event.

Situation changes if you run different processes, not threads. In this case yes, the process could exit unexpectedly and you need to figure out how to find out if its exists.

If you expect that some coding in your thread can cause unexpected exit, you might just instantiate a guarding class with a destructor. The latter can do something to notify your system that the thread exists:

 struct Watchdog {
     ~WatchDog() {
          restartMe();
     }
  };

  void start() {
     WatchDog watcher;
     ...
   }

You might try to restart your thread from the destructor, or just notify yet another watcher thread that you need to restart this one.

Of cause, for separate processes you would need an absolutely different approach, i.e. heartbit pings, or whatever.

answered Feb 16 '18 at 21:26

Serge

11,616
3
18
28

This is the approach I'm probably going to use. The watchdog object might also have `thread_local` storage, depending on my needs. I'm also wrapping the entire chunk of code the thread executes, inside a giant catch-all, so any unhandled exception will result in a clean exit + restart. – Giffyguy Feb 16 '18 at 21:37
@Giffyguy how is 'clean exit + restart' any different than a 'while (true)' loop at the top level, surrounding your 'catch all'? – Martin James Feb 17 '18 at 14:43
@MartinJames catch all does not catch normal exits from the thread. i.e. pthread_exit – Serge Feb 17 '18 at 15:49
@MartinJames My code includes a `while(true)` loop, along with `std::condition_variable` so the background thread can pause and wait for more data to process, if needed. The try/catch-all encloses the entire loop, not the other way around. I'm not sure what you think is going on here, but you seem to be unnecessarily confused about this scenario ... – Giffyguy Feb 17 '18 at 17:48

In C++, is it possible to detect the unexpected termination of a thread?

1 Answers1