The program I am implementing involves iterating over a medium amount of independent data, performing some computation, collecting the result, and then looping back over again. This loop needs to execute very quickly. To solve this, I am trying to implement the following thread pattern.
Rather than spawn threads in setup and join them in collect, I would like to spawn all threads initially, and keep them synchronized throughout their loops. This question regarding thread barriers had initially seemed to point me in the right direction, but my implementation of them is not working. Below is my example
int main() {
int counter = 0;
int threadcount = 10;
auto on_completion = [&]() noexcept {
++counter; // Incremenent counter
};
std::barrier sync_point(threadcount, on_completion);
auto work = [&]() {
while(true)
sync_point.arrive_and_wait(); // Keep cycling the sync point
};
std::vector<std::thread> threads;
for (int i = 0; i < threadcount; ++i)
threads.emplace_back(work); // Start every thread
for (auto& thread : threads)
thread.join();
}
To keep things as simple as possible, there is no computation being done in the worker threads, and I have done away with the setup thread. I am simply cycling the threads, syncing them after each cycle, and keeping a count of how many times they have looped. However, this code is deadlocking very quickly. More threads = faster deadlock. Adding work/delay inside the compute threads slows down the deadlock, but does not stop it.
Am I abusing the thread barrier? Is this unexpected behavior? Is there a cleaner way to implement this pattern?
Edit
It looks like removing the on_completion
gets rid of the deadlock. I tried a different approach to meet the synchronization requirements without using the function, but it still deadlocks fairly quickly.
int threadcount = 10;
std::barrier start_point(threadcount + 1);
std::barrier stop_point(threadcount + 1);
auto work = [&](int i) {
while(true) {
start_point.arrive_and_wait();
stop_point.arrive_and_wait();
}
};
std::vector<std::thread> threads;
for (int i = 0; i < threadcount; ++i) {
threads.emplace_back(work, i);
}
while (true) {
std::cout << "Setup" << std::endl;
start_point.arrive_and_wait(); // Sync to start
// Workers do work here
stop_point.arrive_and_wait(); // Sync to end
std::cout << "Collect" << std::endl;
}