I have number of "sites" (m) that each has to process an event (chunks of data. all available on the get go). Each event (n of them) is sent to each site for processing. So you may think that I have nxm tasks. The order of processing is not important, only that one site may not process more than one event at a time (so Task(m,x) cannot run in parallel to Task(m,y))
Currently it's implemented using "OMP parallel for" on the sites, nested in a regular for loop on the events
for(...event...)
#pragma omp parallel for
for(...site...)
site.process(event)
This is working fine, however not all sites have the same complexity for each event. i.e all sites have to wait for the slowest site before moving on to the next event. I guesstimate that if I allow workers to move on to the next event I can save a factor of two.
What is the best way to implement this? I'm using C++ I'm looking into TBB Flow Graph, or multiple pipe lines...
One more consideration is that each "event" has to be read from disk, and takes up a bit of memory. Although not critical yet, I would like to have as few event in the system at a time (or limit them). In the current implementation I have only one (plus a couple being prepared in the background) Thanks