Efficiently waiting for all tasks in a threadpool to finish

Question

I currently have a program with x workers in my threadpool. During the main loop y tasks are assigned to the workers to complete, but after the tasks are sent out I must wait for all tasks for finish before preceding with the program. I believe my current solution is inefficient, there must be a better way to wait for all tasks to finish but I am not sure how to go about this

// called in main after all tasks are enqueued to 
// std::deque<std::function<void()>> tasks
void ThreadPool::waitFinished()
{
    while(!tasks.empty()) //check if there are any tasks in queue waiting to be picked up
    {
        //do literally nothing
    }
}

More information:

threadpool structure

//worker thread objects
class Worker {
    public:
        Worker(ThreadPool& s): pool(s) {}
        void operator()();
    private:
        ThreadPool &pool;
};

//thread pool
class ThreadPool {
    public:
        ThreadPool(size_t);
        template<class F>
        void enqueue(F f);   
        void waitFinished();
        ~ThreadPool();
    private:
        friend class Worker;
        //keeps track of threads so we can join
        std::vector< std::thread > workers;
        //task queue
        std::deque< std::function<void()> > tasks;
        //sync
        std::mutex queue_mutex;
        std::condition_variable condition;
        bool stop;
};

or here's a gist of my threadpool.hpp

example of what I want to use waitFinished() for:

while(running)
    //....
    for all particles alive
        push particle position function to threadpool
    end for

    threadPool.waitFinished();

    push new particle position data into openGL buffer
end while

so this way I can send hundrends of thousands of particle position tasks to be done in parallel, wait for them to finish and put the new data inside the openGL position buffers

You could use [`std::condition_variable`](http://en.cppreference.com/w/cpp/thread/condition_variable) and [`wait_until`](http://en.cppreference.com/w/cpp/thread/condition_variable/wait_until). — user703016, May 27 '14 at 18:29
@WilliamAndrewMontgomery I attempted to do this but I would need a mutex wouldn't I? This would lock tasks stopping all other threads — Syntactic Fructose, May 27 '14 at 18:37
In windows, you could use an array of handles returned by CreateThread(), and then use WaitForMultipleObjects() with that array to wait for all threads to terminate (as a single atomic operation). I'm wondering if there's an equivalent to this in STL. — rcgldr, May 27 '14 at 20:03
Or better, keep a `std::vector` with all of the worker threads in it, and [`join`](http://en.cppreference.com/w/cpp/thread/thread/join "std::thread::join") each of them in `waitFinished`. — Casey, May 27 '14 at 20:16
Both the Windows and std::thread solutions suggested assume your thread pool threads will *terminate* when no more work is available. If that assumption is valid (and I doubt it is), it is a straight-forward solution. Otherwise, a cvar+mtx should be used (and you already have a mutex, btw. its the one you *better* be using to protect that queue unless its lockless). — WhozCraig, May 27 '14 at 20:19
@Casey will join() end the thread however? I want the threads to be constantly running throughout the program until the destructor is called. And I do already keep a vector of threads ;) — Syntactic Fructose, May 27 '14 at 20:20
@WhozCraig there will be certain cases the workers have no tasks but I want the threads to continue, but last time I tried to use a mutex in my function it just locked the tasks container and froze everything as the other queued tasks could not be sent off. — Syntactic Fructose, May 27 '14 at 20:22
@SyntacticFructose I understand. Its not as complicated as it may first seem. Are you using the `std::thread` library from C++11? If so, I can likely craft you something usable in short-order. What are you using for your threading model? — WhozCraig, May 27 '14 at 20:24
@WhozCraig I updated my post with more information and also linked a gist of threadpool.hpp, hopefully that makes my problem clearer. — Syntactic Fructose, May 27 '14 at 20:28
it does. Not clear on the purpose of Worker, but the idea behind the queue of function objects to invoke is certainly clear enough. This should be pretty straight-up. Unless someone gets there before me I'll see what I can do for you. The important thing to keep in mind is the purpose of a cv+mtx. They're **not** for holding state; they're for protecting predicate data (which is the *real* state) and signaling changes to that data therein. Anyway, give me a few minutes. — WhozCraig, May 27 '14 at 20:30

WhozCraig · Accepted Answer · 2016-10-03T05:24:27.897

This is one way to do what you're trying. Using two condition variables on the same mutex is not for the light-hearted unless you know what is going on internally. I didn't need the atomic processed member other than my desire to demonstrate how many items were finished between each run.

The sample workload function in this generates one million random int values, then sorts them (gotta heat my office one way or another). waitFinished will not return until the queue is empty and no threads are busy.

#include <iostream>
#include <deque>
#include <functional>
#include <thread>
#include <condition_variable>
#include <mutex>
#include <random>

//thread pool
class ThreadPool
{
public:
    ThreadPool(unsigned int n = std::thread::hardware_concurrency());

    template<class F> void enqueue(F&& f);
    void waitFinished();
    ~ThreadPool();

    unsigned int getProcessed() const { return processed; }

private:
    std::vector< std::thread > workers;
    std::deque< std::function<void()> > tasks;
    std::mutex queue_mutex;
    std::condition_variable cv_task;
    std::condition_variable cv_finished;
    std::atomic_uint processed;
    unsigned int busy;
    bool stop;

    void thread_proc();
};

ThreadPool::ThreadPool(unsigned int n)
    : busy()
    , processed()
    , stop()
{
    for (unsigned int i=0; i<n; ++i)
        workers.emplace_back(std::bind(&ThreadPool::thread_proc, this));
}

ThreadPool::~ThreadPool()
{
    // set stop-condition
    std::unique_lock<std::mutex> latch(queue_mutex);
    stop = true;
    cv_task.notify_all();
    latch.unlock();

    // all threads terminate, then we're done.
    for (auto& t : workers)
        t.join();
}

void ThreadPool::thread_proc()
{
    while (true)
    {
        std::unique_lock<std::mutex> latch(queue_mutex);
        cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
        if (!tasks.empty())
        {
            // got work. set busy.
            ++busy;

            // pull from queue
            auto fn = tasks.front();
            tasks.pop_front();

            // release lock. run async
            latch.unlock();

            // run function outside context
            fn();
            ++processed;

            latch.lock();
            --busy;
            cv_finished.notify_one();
        }
        else if (stop)
        {
            break;
        }
    }
}

// generic function push
template<class F>
void ThreadPool::enqueue(F&& f)
{
    std::unique_lock<std::mutex> lock(queue_mutex);
    tasks.emplace_back(std::forward<F>(f));
    cv_task.notify_one();
}

// waits until the queue is empty.
void ThreadPool::waitFinished()
{
    std::unique_lock<std::mutex> lock(queue_mutex);
    cv_finished.wait(lock, [this](){ return tasks.empty() && (busy == 0); });
}

// a cpu-busy task.
void work_proc()
{
    std::random_device rd;
    std::mt19937 rng(rd());

    // build a vector of random numbers
    std::vector<int> data;
    data.reserve(100000);
    std::generate_n(std::back_inserter(data), data.capacity(), [&](){ return rng(); });
    std::sort(data.begin(), data.end(), std::greater<int>());
}

int main()
{
    ThreadPool tp;

    // run five batches of 100 items
    for (int x=0; x<5; ++x)
    {
        // queue 100 work tasks
        for (int i=0; i<100; ++i)
            tp.enqueue(work_proc);

        tp.waitFinished();
        std::cout << tp.getProcessed() << '\n';
    }

    // destructor will close down thread pool
    return EXIT_SUCCESS;
}

Output

Best of luck.

just curious, would I be able to pass a lamba function into enqueue? I have about 3-4 vars that my function requires. Thanks a ton — Syntactic Fructose, May 27 '14 at 21:32
@SyntacticFructose so long as it conforms to `void()`, certainly. Otherwise packing and unpacking params into a delayed dispatch-capable entity gets, in a word, complicated (but still doable). Varying the work proc would likely be better done with a simple object that exposes `operator()` (which I guess is what your `Worker` class was all about, now that I think about it). — WhozCraig, May 27 '14 at 21:43
hmm my program is hanging and I believe it is because my lamba is [&](){//...} and uses a couple variable from the loop. Could that be why? It seem's your code works so it's the implementation of mine once again. I'm making this a bounty +50, you've helped a lot — Syntactic Fructose, May 27 '14 at 21:46
There is a race condition since `busy` is decremented without being protected by `queue_mutex`: When this modification of `busy` and the following call to `notify_one()` happen right after the lambda condition in `waitFinished()` is evaluated (namely to `false`) and right before the actual waiting begins, the notification will be missed and the caller of `waitFinished()` will wait forever. I think that's the reason why @SyntacticFructose experiences a hanging program. Maybe you can edit your otherwise great answer accordingly. — ph4nt0m, Sep 05 '15 at 20:42
@ph4nt0m You're totally right. I'll have to rework that and latch prior to the member mods. Thanks for catching that, and to all the folks that up-ticked your comment (I joined them =P). — WhozCraig, Oct 03 '16 at 05:24
Don't know if the fix is up yet, if not I tried to post one a while ago but didn't have enough reputation at the time to post comments. You can find it here: https://stackoverflow.com/review/suggested-edits/9120592. Basically I just added a `lock.lock()` at a particular spot to make it work. It was a lot of fun to debug and your answer helped me tremendously, thanks a lot. — deb0ch, May 04 '18 at 12:22

Efficiently waiting for all tasks in a threadpool to finish

1 Answers1

Linked