0

So I built a program that should be released to production soon, but I'm worried if I run into a situation where all threads lock/wait, that the pipeline will be compromised. I am pretty sure I designed it so this won't happen, but if it were to, I'd like to kill all the threads and produce a boilerplate output. My first assumption was to simply code a thread to monitor the iterations of all the other threads, killing them if no iteration occurs for 5 seconds, but this doesn't seem to work, and also there's the problem that all the threads are in some random state of execution:

void deadlock_monitor() {
    while(true) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1000)); 
        int64_t time_diff = gnut_GetMicroTime() - last_thread_iter;
        if(((time_diff/1000) > 5000) && !processing_completed) {
            exit(1);
        }
        if(processing_completed) {
            return;
        }
    }
    return;
}

Is there a best practice to deal with this, or is ensuring there are no race conditions all I can do?

  • 4
    Killing threads from inside the program can leave the program in an unstable state, cause data corruption, etc. Usually an external watchdog program is used to monitor the progress of the main program. If the watchdog program detects that the main program is no longer making progress it kills the main program's process and relaunches it. – Richard Critten Nov 15 '22 at 18:07
  • 2
    The shown code does not attempt any synchronization of this mysterious `processing_completed` object, whatever it is. A sufficiently optimizing compiler can theoretically compile this whole thing down to `if (processing_completed) return; /* infinite loop */`. And, unfortunately, there is no "best practice", whatever that means, for something like this. No two multithreaded C++ programs are alike, in terms of their behavior. – Sam Varshavchik Nov 15 '22 at 18:09
  • Depending on what kind of work your threads are doing (e.g. repeated loops), you can add heart beats and/or running stats to see if everything is normal and stop the thread otherwise. If you actually have to kill something, the way to go would be as @RichardCritten described. Note the difference between threads and processes. – Cedric Nov 15 '22 at 18:09
  • processing_completed becomes true after all the threads exit and return execution to main. I put it in so there won't be an infinite loop on proper execution of main. – Patrick McKeever Nov 15 '22 at 18:10
  • Is last_thread_iter a std::atomic? It is not protected in the loop, so the compiler may assume it doesn't change if it's not atomic. – stefaanv Nov 15 '22 at 18:11
  • 1
    Just because "processing_completed becomes true" in one execution thread makes absolutely no guarantees, whatsoever, that the true value becomes visible in any other execution thread, unless both execution threads are properly synchronized. C++ threads don't work this way. Thread synchronization is a complex topic, and you are referred to your C++ textbook for more information. – Sam Varshavchik Nov 15 '22 at 18:11
  • I feel like I might not have posted enough code. But regardless, the code I posted executes how I want it to, if all the thread are waiting, it enters the first if. If the program completes as normal, it enters the second if. My problem is calling exit(1), doesn't exit the program, which makes sense to me. But I'm not sure the best way to monitor the execution of a multithreaded program, and kill the execution if it stops making progress. – Patrick McKeever Nov 15 '22 at 18:14
  • I don't think this would be the way to go, but `pthread` also offers robust mutexes. These can be used to detect deadlocks and act accordingly: https://man7.org/linux/man-pages/man3/pthread_mutexattr_setrobust.3.html Note that not all systems offer this feature. – Cedric Nov 15 '22 at 18:17
  • @Cedric C++ includes `std::mutex` etc. since C++11. There's no need to use a platform specific C API for normal threading stuff in a modern C++ program. – Ted Lyngmo Nov 15 '22 at 18:18
  • _".. stops making progress...."_ The main process can use any cross-process communication method to indicate it is still running correctly eg write to a log file, UDP heartbeat, shared-memory etc. The watchdog process is monitoring this communication and if it detects no progress for an interval of time (this depends on the main programs tasks) takes action. – Richard Critten Nov 15 '22 at 18:19
  • 1
    Re, "exit(1), doesn't exit the program," Have you tried calling [`_exit()`](https://man7.org/linux/man-pages/man2/_exit.2.html) instead? Failing that, how about `kill(getpid(), SIGKILL)`? – Solomon Slow Nov 15 '22 at 18:29
  • @TedLyngmo For a single process, that is certainly true. But if multiple processes are involved (e.g. watchdog), I wouldn't know how to detect whether an owning process died holding the mutex. Is that possible without accessing the native handle? – Cedric Nov 15 '22 at 18:32
  • @SolomonSlow Thanks, that fixed my problem with the program not exiting! – Patrick McKeever Nov 15 '22 at 18:34
  • 1
    @Cedric I wouldn't use thread mutexes for interprocess synchronization at all. I'd probably look at something like https://theboostcpplibraries.com/boost.interprocess-synchronization to get that done. – Ted Lyngmo Nov 15 '22 at 18:43

0 Answers0