#pragma omp parallel for ordered
for (int i = 0; i < n; ++i) {
... code happens nicely in parallel here ...
#pragma omp ordered
{
.. one at a time in order of i, as expected, good ...
}
... single threaded here but I expected parallel ...
}
I expected the next thread to enter the ordered section as soon as this thread left the ordered section. But the next thread only enters the ordered section when the for loop's body ends. So the code after the ordered section ends goes serially.
The OpenMP 4.0 manual contains :
The ordered construct specifies a structured block in a loop region that will be executed in the order of the loop iterations. This sequentializes and orders the code within an ordered region while allowing code outside the region to run in parallel.
Where I've added the bold. I'm reading "outside" to include after the ordered section ends.
Is this expected? Must the ordered section in fact be last?
I've searched for an answer and did find one other place where someone observed similar nearly 2 years ago : https://stackoverflow.com/a/32078625/403310 :
Testing with gfortran 5.2, it appears everything after the ordered region is executed in order for each loop iteration, so having the ordered block at the beginning of the loop leads to serial performance while having the ordered block at the end of the loop does not have this implication as the code before the block is parallelized. Testing with ifort 15 is not as dramatic but I would still recommend structuring your code so your ordered block occurs after any code than needs parallelization in a loop iteration rather than before.
I'm using gcc 5.4.0 on Ubuntu 16.04.
Many thanks.