Avoiding deadlock in concurrent waiting object

Question

I've implemented a "Ticket" class which is shared as a shared_ptr between multiple threads.

The program flow is like this:

parallelQuery() is called to start a new query job. A shared instance of Ticket is created.
The query is split into multiple tasks, each task is enqueued on a worker thread (this part is important, otherwise I'd just join threads and done). Each task gets the shared ticket.
ticket.wait() is called to wait for all tasks of the job to complete.
When one task is done it calls the done() method on the ticket.
When all tasks are done the ticket is unlocked, result data from the task aggregated and returned from parallelQuery()

In pseudo code:

     std::vector<T> parallelQuery(std::string str) {
         auto ticket = std::make_shared<Ticket>(2);
         auto task1 = std::make_unique<Query>(ticket, str+"a");
         addTaskToWorker(task1);
         auto task2 = std::make_unique<Query>(ticket, str+"b");
         addTaskToWorker(task2);
         ticket->waitUntilDone();
         auto result = aggregateData(task1, task2);
         return result;
     }

My code works. But I wonder if it is theoretically possible that it can lead to a deadlock in case when unlocking the mutex is executed right before it gets locked again by the waiter thread calling waitUntilDone().

Is this a possibility, and how to avoid this trap?

Here is the complete Ticket class, note the execution order example comments related to the problem description above:

#include <mutex>
#include <atomic>

    class Ticket {
    public:
        Ticket(int numTasks = 1) : _numTasks(numTasks), _done(0), _canceled(false) {
            _mutex.lock();
        }

        void waitUntilDone() {
            _doneLock.lock();
            if (_done != _numTasks) {
                _doneLock.unlock(); // Execution order 1: "waiter" thread is here
                _mutex.lock(); // Execution order 3: "waiter" thread is now in a dealock?
            }
            else {
                _doneLock.unlock();
            }
        }

        void done() {
            _doneLock.lock();
            _done++;
            if (_done == _numTasks) {
                _mutex.unlock(); // Execution order 2: "task1" thread unlocks the mutex
            }
            _doneLock.unlock();
        }

        void cancel() {
            _canceled = true;
            _mutex.unlock();
        }

        bool wasCanceled() {
            return _canceled;
        }

        bool isDone() {
            return _done >= _numTasks;
        }

        int getNumTasks() {
            return _numTasks;
        }

    private:
        std::atomic<int> _numTasks;
        std::atomic<int> _done;
        std::atomic<bool> _canceled;
        // mutex used for caller wait state
        std::mutex _mutex;
        // mutex used to safeguard done counter with lock condition in waitUntilDone
        std::mutex _doneLock;
    };

One possible solution which just came to my mind when editing the question is that I can put _done++; before the _doneLock(). Eventually, this should be enough?

Update

I've updated the Ticket class based on the suggestions provided by Tomer and Phil1970. Does the following implementation avoid mentioned pitfalls?

class Ticket {
public:
    Ticket(int numTasks = 1) : _numTasks(numTasks), _done(0), _canceled(false) { }

    void waitUntilDone() {
        std::unique_lock<std::mutex> lock(_mutex);
        // loop to avoid spurious wakeups
        while (_done != _numTasks && !_canceled) {
            _condVar.wait(lock);
        }
    }

    void done() {
        std::unique_lock<std::mutex> lock(_mutex);
        // just bail out in case we call done more often than needed
        if (_done == _numTasks) {
            return;
        }
        _done++;
        _condVar.notify_one();
    }

    void cancel() {
        std::unique_lock<std::mutex> lock(_mutex);
        _canceled = true;
        _condVar.notify_one();
    }

    const bool wasCanceled() const {
        return _canceled;
    }

    const bool isDone() const {
        return _done >= _numTasks;
    }

    const int getNumTasks() const {
        return _numTasks;
    }

private:
    std::atomic<int> _numTasks;
    std::atomic<int> _done;
    std::atomic<bool> _canceled;
    std::mutex _mutex;
    std::condition_variable _condVar;
};

I do apologize in advance, but the code looks a bit messy because you're trying to force a mutex which may not be the best synchronization mechanism for your use case. I could be wrong, but I recommend you try using "Cond" (+ mutex) instead: https://en.cppreference.com/w/cpp/thread/condition_variable — Tomer, Aug 10 '19 at 12:46
waitUntilDone - should probably be a while (but again it's hard to understand the code). — Tomer, Aug 10 '19 at 12:49
@Tomer thank you for your comment. The reason I initially did not use a conditional var was that the ticket has only one waiter. However, it seems a better choice indeed. How about my updated version? — benjist, Aug 10 '19 at 13:48
What you actually are implementing is a `std::barrier`: https://en.cppreference.com/w/cpp/experimental/barrier. These will become available in C++20. Until that time, use a `std::condition_variable`. — G. Sliepen, Aug 10 '19 at 13:51
@benjist looks much better now! the lock in "done()" should go to the top, otherwise there is a scenario in which "done" will be larger than the number of tasks and therefore your "while" loop may not exit. (alternatively) you may do "while (done < num_of_tasks..." — Tomer, Aug 10 '19 at 14:05
@benjist Don'r edit questions changing the original problem that was asked for. Especially not if their are answers already based on your original code. That's a massive disrespect against the answerer. — πάντα ῥεῖ, Aug 10 '19 at 14:11
I have really no idea what you are talking about. The question was, and still is, whether a deadlock can occur for given code. @Tomer pointed out a conditional variable is a better choice. The answer has been edited using the suggestions from Tomer and Phil1970. Updated code, same problem, same question. Please do not edit my question again. — benjist, Aug 10 '19 at 14:24
Not proposing a solution, but you should read about semaphores. A mutex is not a plug in replacement for a semaphore, not even a binary one. — curiousguy, Aug 10 '19 at 15:02
As shown in my answer, you should use the `wait` overload with a predicate instead of a loop. I think it will avoid a thread switching if the condition is not yet verified — Phil1970, Aug 10 '19 at 15:08
You are using `const` all wrong: `const` on a really pure rvalue, like an `int` or a `bool` is meaningless. The returned value of a function returning a scalar is inherently not modifiable; **it is not a variable, does not have an address, is not an object.** So `const` is useless here, but would be very useful on class members that should not be modified by design: do you want `_numTasks` to be modifiable? — curiousguy, Aug 10 '19 at 15:14
Well in C++, it seems that the predicate version is essentially equivalent to the explicit loop version. However, **it is still preferable to use the predicate for readability** and reduce the possibility of writing incorrect code — Phil1970, Aug 10 '19 at 15:30
Another think with original code is that **the thread that lock a mutex must be the one that unlock it**. I suspect it is not the case for the `done` method. — Phil1970, Aug 10 '19 at 15:49

Phil1970 · Accepted Answer · 2019-08-10T18:41:01.973

Don't write your own wait methods but use std::condition_variable instead.

https://en.cppreference.com/w/cpp/thread/condition_variable.

Mutexes usage

Generally, a mutex should protect a given region of code. That is, it should lock, do its work and unlock. In your class, you have multiple method where some lock _mutex while other unlock it. This is very error-prone as if you call the method in the wrong order, you might well be in an inconsistant state. What happen if a mutex is lock twice? or unlocked when already unlocked?

The other thing to be aware with mutex is that if you have multiple mutexes, it that you can easily have deadlock if you need to lock both mutexes but don't do it in consistant order. Suppose that thread A lock mutex 1 first and the mutex 2, and thread B lock them in the opposite order (mutex 2 first). There is a possibility that something like this occurs:

Thread A lock mutex 1
Thread B lock mutex 2
Thread A want to lock mutex 2 but cannot as it is already locked.
Thread B want to lock mutex 1 but cannot as it is already locked.
Both thread will wait forever

So in your code, you should at least have some checks to ensure proper usage. For example, you should verify _canceled before unlocking the mutex to ensure cancel is called only once.

Solution

I will just gave some ideas

Declare a mutux and a condition_variable to manage the done condition in your class.

std::mutex doneMutex;
std::condition_variable done_condition;

Then waitUntilDone would look like:

void waitUntilDone()
{
    std::unique_lock<std::mutex> lk(doneMutex);
    done_condition.wait(lk, []{ return isDone() || wasCancelled();});
}

And done function would look like:

void done() 
{
    std::lock_guard<std::mutex> lk(doneMutex);
    _done++;
    if (_done == _numTasks) 
    {
        doneCondition.notify_one();
    }
}

And cancel function would become

void done() 
{
    std::lock_guard<std::mutex> lk(doneMutex);
    _cancelled = true;
    doneCondition.notify_one();
}

As you can see, you only have one mutex now so you basically eliminate the possibility of a deadlock.

Variable naming

I suggest you to not use lock in the name of you mutex since it is confusing.

std::mutex someMutex;
std::guard_lock<std::mutex> someLock(someMutex); // std::unique_lock when needed

That way, it is far easier to know which variable refer to the mutex and which one to the lock of the mutex.

Good reading

If you are serious about multithreading, then you should buy that book:

C++ Concurrency in Action
Practical Multithreading
Anthony Williams

Code Review (added section)

Essentially same code has beed posted to CODE REVIEW: https://codereview.stackexchange.com/questions/225863/multithreading-ticket-class-to-wait-for-parallel-task-completion/225901#225901.

I have put an answer there that include some extra points.

Thank you very much for your detailed answer. I think my updated version works very similar to your suggestion. Do you see an issue with the last update? — benjist, Aug 10 '19 at 13:52
@benjist https://stackoverflow.com/questions/57441479/avoiding-deadlock-in-concurrent-waiting-object#comment101361836_57441479 — πάντα ῥεῖ, Aug 10 '19 at 14:12
You should put your condition in a lambda and not a loop as in my implantation of `waitUntilDone`. — Phil1970, Aug 10 '19 at 15:04
The issue with your updated question is that it is a massive change. My answer was based on original code. — Phil1970, Aug 10 '19 at 16:27

Adel Chepkunov · Answer 2 · 2019-08-11T01:07:38.937

0

You not need to use mutex for operate with atomic values

UPD

my answer to mainn question was wrong. I deleted one.

You can use simple (non atomic) int _numTasks; also. And you not need shared pointer - just create Task on the stack and pass pointer

     Ticket ticket(2);
     auto task1 = std::make_unique<Query>(&ticket, str+"a");
     addTaskToWorker(task1);

or unique ptr if you like

     auto ticket = std::make_unique<Ticket>(2);
     auto task1 = std::make_unique<Query>(ticket.get(), str+"a");
     addTaskToWorker(task1);

because shared pointer can be cut by Occam's razor :)

edited Aug 11 '19 at 01:07

answered Aug 10 '19 at 12:02

Adel Chepkunov

79
1
9

The increment and access to the counter variables is not the problem at all, that is not the reason for the mutex. The problem is in the time it takes to unlock(), compare and lock(). There is eventually no guaranty that lock() in the wait method is called before the unlock() got called in the done method. Or is there? One thread can be slower than another. – benjist Aug 10 '19 at 12:19
@benjist As mentionned in my answer, if you only have one mutex to lock, then you avoid the possibility of a deadlock related to the order in which look were obtained. **However, the potential problem here is that `done` would have to be called from the thread that create the `Ticket` object**. – Phil1970 Aug 10 '19 at 15:45
@Phil1970, I can not catch, why need notifications in this case. _mutex locked in constructor. In any thread we can call done(). done() will release mutex if _done == _numTasks. Thread which call ticket->waitUntilDone(); will be paused if _done != _numTasks. Where is the problem? – Adel Chepkunov Aug 11 '19 at 00:35
@AdelChepkunov See my answer for more details including **code review** section. The main problem is that unlocking a mutex from a thread other than the one that lock it is **undefined behavior** (not a semaphore) – Phil1970 Aug 11 '19 at 00:49

Avoiding deadlock in concurrent waiting object

2 Answers2