I am running the following loop using, say, 8 OpenMP threads:
float* data;
int n;
#pragma omp parallel for schedule(dynamic, 1) default(none) shared(data, n)
for ( int i = 0; i < n; ++i )
{
DO SOMETHING WITH data[i]
}
Due to NUMA, I'd like to run first half of the loop (i = 0, ..., n/2-1) with threads 0,1,2,3 and second half (i = n/2, ..., n-1) with threads 4,5,6,7.
Essentially, I want to run two loops in parallel, each loop using a separate group of OpenMP threads.
How do I achieve this with OpenMP?
Thank you
PS: Ideally, if threads from one group are done with their half of the loop, and the other half of the loop is still not done, I'd like threads from finished group join unsfinished group processing the other half of the loop.
I am thinking about something like below, but I wonder if I can do this with OpenMP and no extra book-keeping:
int n;
int i0 = 0;
int i1 = n / 2;
#pragma omp parallel for schedule(dynamic, 1) default(none) shared(data,n,i0,i1)
for ( int i = 0; i < n; ++i )
{
int nt = omp_get_thread_num();
int j;
#pragma omp critical
{
if ( nt < 4 ) {
if ( i0 < n / 2 ) j = i0++; // First 4 threads process first half
else j = i1++; // of loop unless first half is finished
}
else {
if ( i1 < n ) j = i1++; // Second 4 threads process second half
else j = i0++; // of loop unless second half is finished
}
}
DO SOMETHING WITH data[j]
}