To solve this particular problem Intel® Threading Building Blocks library includes special constructions. Intel® TBB is cross-platform library that aids in multithreading programming.
We could look at the entities involved in your application as at four different task providers. One type of tasks is input tasks - those that provide input data, another type of tasks is provided by the first manipulation routine, and so on.
Thus, the only thing the user needs to do is to provide the body for those tasks. There are several APIs in the library for specifying what bodies to be processed and how to do it in parallel. Everything else (here I mean thread creation, synchronization between task execution, work balancing, etc.) is done by the library.
The simplest variant of the solution that came to my mind is the using of parallel_pipeline function. Here is the prototype:
#include "tbb/pipeline.h"
using namespace tbb;
int main() {
parallel_pipeline(/*specify max number of bodies executed in parallel, e.g.*/16,
make_filter<void, input_data_type>(
filter::serial_in_order, // read data sequentially
[](flow_control& fc) -> input_data_type {
if ( /*check some stop condition: EOF, etc.*/ ) {
fc.stop();
return input_data_type(); // return dummy value
}
auto input_data = read_data();
return input_data;
}
) &
make_filter<input_data_type, manipulator1_output_type>(
filter::parallel, // process data in parallel by the first manipulator
[](input_data_type elem) -> manipulator1_output_type {
auto processed_elem = manipulator1::process(elem);
return processed_elem;
}
) &
make_filter<manipulator1_output_type, manipulator2_output_type>(
filter::parallel, // process data in parallel by the second manipulator
[](manipulator1_output_type elem) -> manipulator2_output_type {
auto processed_elem = manipulator2::process(elem);
return processed_elem;
}
) &
make_filter<manipulator2_output_type, void>(
filter::serial_in_order, // visualize frame by frame
[](manipulator2_output_type elem) {
visualize(elem);
}
)
);
return 0;
}
provided that necessary functions (read_data, visualize) are implemented. Here input_data_type
, manipulator1_output_type
, etc. are the types that are passed between pipeline stages, and manipulator's process
functions do necessary computation on the passed arguments.
BTW, to avoid working with locks and other synchronization primitives, you can use concurrent_bounded_queue from the library and put your input data into this queue, by possibly different thread (e.g. dedicated to IO operations), as simple as concurrent_bounded_queue_instance.push(elem)
, and then read it via input_data_type elem; concurrent_bounded_queue_instance.pop(elem)
. Note that popping an item is a blocking operation here. concurrent_queue
provides non-blocking try_pop
alternative.
The other possibility is to use tbb::flow_graph
and its nodes for organizing the same pipelining scheme. Take a look at two examples that describe dependency and data flow graphs. You might need to use sequencer_node for correct ordering of items execution (if necessary).
It is worth reading the SO questions marked by tbb tag to see how other people use this library.