I've implemented a "Ticket" class which is shared as a shared_ptr between multiple threads.
The program flow is like this:
- parallelQuery() is called to start a new query job. A shared instance of Ticket is created.
- The query is split into multiple tasks, each task is enqueued on a worker thread (this part is important, otherwise I'd just join threads and done). Each task gets the shared ticket.
- ticket.wait() is called to wait for all tasks of the job to complete.
- When one task is done it calls the done() method on the ticket.
- When all tasks are done the ticket is unlocked, result data from the task aggregated and returned from parallelQuery()
In pseudo code:
std::vector<T> parallelQuery(std::string str) {
auto ticket = std::make_shared<Ticket>(2);
auto task1 = std::make_unique<Query>(ticket, str+"a");
addTaskToWorker(task1);
auto task2 = std::make_unique<Query>(ticket, str+"b");
addTaskToWorker(task2);
ticket->waitUntilDone();
auto result = aggregateData(task1, task2);
return result;
}
My code works. But I wonder if it is theoretically possible that it can lead to a deadlock in case when unlocking the mutex is executed right before it gets locked again by the waiter thread calling waitUntilDone().
Is this a possibility, and how to avoid this trap?
Here is the complete Ticket class, note the execution order example comments related to the problem description above:
#include <mutex>
#include <atomic>
class Ticket {
public:
Ticket(int numTasks = 1) : _numTasks(numTasks), _done(0), _canceled(false) {
_mutex.lock();
}
void waitUntilDone() {
_doneLock.lock();
if (_done != _numTasks) {
_doneLock.unlock(); // Execution order 1: "waiter" thread is here
_mutex.lock(); // Execution order 3: "waiter" thread is now in a dealock?
}
else {
_doneLock.unlock();
}
}
void done() {
_doneLock.lock();
_done++;
if (_done == _numTasks) {
_mutex.unlock(); // Execution order 2: "task1" thread unlocks the mutex
}
_doneLock.unlock();
}
void cancel() {
_canceled = true;
_mutex.unlock();
}
bool wasCanceled() {
return _canceled;
}
bool isDone() {
return _done >= _numTasks;
}
int getNumTasks() {
return _numTasks;
}
private:
std::atomic<int> _numTasks;
std::atomic<int> _done;
std::atomic<bool> _canceled;
// mutex used for caller wait state
std::mutex _mutex;
// mutex used to safeguard done counter with lock condition in waitUntilDone
std::mutex _doneLock;
};
One possible solution which just came to my mind when editing the question is that I can put _done++; before the _doneLock(). Eventually, this should be enough?
Update
I've updated the Ticket class based on the suggestions provided by Tomer and Phil1970. Does the following implementation avoid mentioned pitfalls?
class Ticket {
public:
Ticket(int numTasks = 1) : _numTasks(numTasks), _done(0), _canceled(false) { }
void waitUntilDone() {
std::unique_lock<std::mutex> lock(_mutex);
// loop to avoid spurious wakeups
while (_done != _numTasks && !_canceled) {
_condVar.wait(lock);
}
}
void done() {
std::unique_lock<std::mutex> lock(_mutex);
// just bail out in case we call done more often than needed
if (_done == _numTasks) {
return;
}
_done++;
_condVar.notify_one();
}
void cancel() {
std::unique_lock<std::mutex> lock(_mutex);
_canceled = true;
_condVar.notify_one();
}
const bool wasCanceled() const {
return _canceled;
}
const bool isDone() const {
return _done >= _numTasks;
}
const int getNumTasks() const {
return _numTasks;
}
private:
std::atomic<int> _numTasks;
std::atomic<int> _done;
std::atomic<bool> _canceled;
std::mutex _mutex;
std::condition_variable _condVar;
};