I want to reuse a set of worker threads. Each worker thread performs independent work but they must start and stop processing as a coordinated team. I need an efficient means for each worker thread to block until the main thread tells them all to go, and an efficient means for the main thread to block until all worker threads are finished.
Each chunk of work will only require some tens of microseconds so the usual approach of creating a set of threads then joining them all involves far too much overhead.
The pseudocode is like the following:
main thread:
create N threads
forever
prepare new independent work for each thread
tell all N threads to run their part
wait for all N threads to complete their work
use results
typical worker thread:
forever
wait to run
do my work
indicate to main my work is complete
My question is how best to perform this signaling and synchronization. I am not asking about how to divide up the work or move work to or from each thread; suffice it to say the threads do not interact.