0

What I'm trying to do

I have various things that must run concurrently on a linux, until the program is told to stop through ctrl-C (in which case SIGINT is received) or the service is stopped (in which case SIGTERM is received)

What I've come up with

For each thing that need to be done concurrently, I have a class that launches a thread in the constructor and whose destructor makes the thread stop and joins it. It looks basically like this:

#include <chrono>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <system_error>

class SomeClassToDoStuff
{
public:

    // Constructor
    SomeClassToDoStuff()
    {
        _thread = std::thread([this]() {
            while (true)
            {
                // Do some stuff
                ...

                // Before going on to the next iteration
                {
                    std::unique_lock<std::mutex> dataLock(_mutex);

                    // Wait for 2ms
                    if (!_shouldStopThreadExecution)
                    {
                        _conditionVariable.wait_for(dataLock, std::chrono::milliseconds(2));
                    }

                    // End the loop if requested
                    if (_shouldStopThreadExecution)
                    {
                        break;
                    }
                }
            }

            // Some cleanup
            ...
        });
    }

    // Destructor
    ~SomeClassToDoStuff()
    {
        if (_thread.joinable())
        {
            {
                std::lock_guard<std::mutex> dataLock(_mutex);
                _shouldStopThreadExecution = true;
            }

            _conditionVariable.notify_all();

            try
            {
                _thread.join();
            } catch (std::system_error const&) {}
        }
    }

private:
    mutable std::mutex _mutex;                  // Mutex to keep things thread-safe
    std::condition_variable _conditionVariable; // Condition variable used to wait
    std::thread _thread;                        // Thread in which to do the stuff
    bool _shouldStopThreadExecution = false;    // Whether to stop the thread's execution
};

Then my main() looks like this:

#include <atomic>
#include <chrono>
#include <csignal>
#include <iostream>
#include <thread>

namespace  {

std::atomic<int> programReturnValue(-1);  // If positive or zero, the program must return with that value

}

static void signalHandler(int sig)
{
    std::cout << "Signal received (" << sig << "). This iteration of the main loop will be the last." << std::endl;
    programReturnValue.store(0);
}

int main()
{
    // Make sure the program stops cleanly when receiving SIGTERM or SIGINT
    {
        std::signal(SIGTERM, signalHandler);
        std::signal(SIGINT, signalHandler);
    }

    SomeClassToDoStuffA a;
    SomeClassToDoStuffB b;
    SomeClassToDoStuffC c;
    SomeClassToDoStuffD d;

    while (programReturnValue.load() < 0)
    {
        // Check that everything is alright
        if (someCondition)
        {
            programReturnValue.store(1);
        }

        // Wait for 100us to avoid taking all of the processor's resources
        std::this_thread::sleep_for(std::chrono::microseconds(100));
    }

    return programReturnValue.load();
}

(By the way, if there's an easier way to go about all this I'm interested to know)

The issue

When I hit ctrl+C or end the service, the program prints that signal 2 or 15 has been received (depending on which I used), and the program ends, which is good. However:

  1. The cleanup involves writing something to a file (in which things are successfully written during execution), but it seems that that doesn't always happen, which means that the cleanup isn't fully performed all the time, and that is a problem
  2. The return code of the program isn't 0 as expected, or even 1, but either 130 or 143 depending on what signal is received

Why is that, and what am I doing wrong?

Edit: From what I understand, 130 and 143 are actually 128 + signal, i.e. what the program would return if I didn't try to handle the signals

Edit2: I'm getting a better idea of what's happening, and only half the issue seems to be coming from my program itself.

The program is actually run by a bash script, which then prints its return value and may relaunch it depending on the situation. Sending SIGINT and SIGTERM to the script is also supposed to send SIGTERM to the program.

It turns out that I suck at bash. I had written something like this:

#!/bin/sh

trap "killall myProgram --quiet --wait" 2 15

/path/to/myProgram&
wait $!
RETURN_VALUE=$?
echo "Code exited with return code ${RETURN_VALUE}"

# Some other stuff
...
  • ctrl-C while running the script in terminal actually leads to the program receiving both SIGINT then SIGTERM
  • the return code I'm printing is actually the result of wait+trap, not my program's

I will rework the script, but can the fact that both signals are sent to my program the reason why the cleanup fails sometimes? How? What can I do about it?

Eternal
  • 2,648
  • 2
  • 15
  • 21
  • I am currently working on a similar problem/setup. I like your control-loop with the internal conditional variable, for controlling lock and wake-up conditions. also the destructor checking for the execution-state of the internal thread is a good idea. Might adapt this idea. Thx – mzoll Apr 15 '21 at 07:58
  • I've just read a bit about `std::signal` from https://en.cppreference.com/w/cpp/utility/program/signal and apparently you pass an illegal signal handling function. – ALX23z Apr 15 '21 at 08:07
  • @ALX23z How so? It says I can pass it `SIG_DFL`, `SIG_IGN`, or a pointer to a function with signature equivalent to `extern "C" void fun(int sig);`. I assumed my static function had the correct signature, are you saying there is some name mangling going on here? – Eternal Apr 15 '21 at 12:53
  • @ALX23z Or are you saying I'm making illegal operations in the function? Interacting with an std::atomic is ok... wait, is printing something undefined behaviour? – Eternal Apr 15 '21 at 13:01
  • Please use `sigaction` and `pthread_sigmask` to explicitly define signal handling behavior and to restrict handler execution to a single thread. There's something you're not showing us in the code or in your reproduction steps — but `sigaction` and masking will make make everything easier to understand. (E.g., you might have a signal death if you have two SIGINTs back-to-back and your system's `signal` restores disposition to SIG_DFL upon handler invocation.) – pilcrow Apr 15 '21 at 13:46
  • @Eternal apparently calling almost any STL function from the signal handler is UB. At least that's what is written in the reference. Changing the atomic should be fine. – ALX23z Apr 15 '21 at 14:49
  • @pilcrow I'm doing nothing more within the code, but what you said about two signals is probably what's going on here (see my edit). You say "my system's `signal`"... does it mean that `std::signal` (which is in the standard library) is actually so implementation-dependent that using an os-specific feature like `sigaction` is actually more reliable? Would it allow me to fix the issue (if so, please post this as an answer) – Eternal Apr 15 '21 at 15:06
  • @Eternal might I unrelated to the problem at hand ask about the purpose of the `condition_variable` and the `mutex`. Isn't it a bit superfluous, because you do not really need coordinated access to, as it is only written from outside (destructor), while the (internal) thread is only reading from it, where the exact time of access is kind of not important, because you are already at a place in the code, where no other data is accessed (eval looping-condition). Maybe I am missing something. – mzoll Apr 16 '21 at 10:14
  • @mzoll The destructor is run from the thread running `main()`, so at the very least there is a need for an atomic variable or a mutex to protect access to `_shouldStopThreadExecution`. As for the condition variable... well, what are the alternatives? If loop without any wait the thread will take a lot of processor resources, and if I use `std::this_thread::sleep_for()` I can't interrupt the sleep to end the thread – Eternal Apr 16 '21 at 10:36
  • @Eternal OKay, while I do not quite follow the argument with the protected access [you flip the state once, and your thread will _eventually_ see it], I do understand that scheduling on the CPU might be an issue, and a forced switch to another thread (possibly the one running the signal-handling in main()) by letting the internal thread wait, although even for a short time, is a viable point and an adequate solution. thanks for explaining. – mzoll Apr 16 '21 at 11:38
  • @mzoll No, the thread won't just "eventually see it". If there's a chance a thread reads or writes a variable as another thread writes it, that's called a race condition, and that's known to be incorrect code. Look it up – Eternal Apr 16 '21 at 13:40
  • @Eternal correct, this is a raise condition, but in this case you suffer no negative effects, because your critical region is during the wait of the 2ms; a quite short while. reading the wrong value (false), while the var is being modified to true (which should be super fast), will only have the outcome of another _continue_ in your looping thread, which is defined behavior. This is basically the same outcome as if the scheduler would have delayed your modifying thread by just a little bit, thus staggering the read/write-op, which you had no real control over in the first place. – mzoll Apr 19 '21 at 07:44

1 Answers1

0

I am a bit confused about your signal handling:

To me it seems you use the terminating System-signal only to set the return-value and break the while loop in main; the threads, or rather their wrappers are terminated i.e. destructed only at the time them going out of scope, which is at the end of your main-scope, when you already have returned! Thrown exceptions (in your destructors) cannot be caught anymore. your threads are therefor not ended yet, while you have already returned from main.

As a solution: I would recommend to set the stopping state _shouldStopThreadExecution at the time the main receives the signal for stopping already. And then remove the try statements for the .join() in your destructor in order to see the correct ending of the threads under quaranty.

mzoll
  • 475
  • 3
  • 11
  • Even for `main()` classes need to first be destroyed before return triggers. So I am uncertain what you mean. – ALX23z Apr 15 '21 at 08:21
  • I... think you don't understand when destructors are called. If objects are constructed within a scope, the destructors are called (in reverse order of construction) when leaving the scope, no matter if we do so through a closing brace, a return instruction or throwing an exception. That's also why there are no "thrown exceptions (in your destructors)": destructors must be noexcept, otherwise if they throw like this it causes program termination. It's a bad practice, and not in my code (notice the try/catch around the one thing that can throw). – Eternal Apr 15 '21 at 11:51