Background
I am relying on OpenMP parallelization and pseudo-random number generation in my program but at the same I would like to make the results to be perfectly replicable if desired (provided the same number of threads).
I'm seeding a thread_local
PRNG for each thread separately like this,
{
std::minstd_rand master{};
#pragma omp parallel for ordered
for(int j = 0; j < omp_get_num_threads(); j++)
#pragma omp ordered
global::tl_rng.seed(master());
}
and I've come up with the following way of producing count
of some elements and putting them all in an array at the end in a deterministic order (results of thread 0 first, of thread 1 next etc.)
std::vector<Element> all{};
...
#pragma omp parallel if(parallel)
{
std::vector<Element> tmp{};
tmp.reserve(count/omp_get_num_threads() + 1);
// generation loop
#pragma omp for
for(size_t j = 0; j < count; j++)
tmp.push_back(generateElement(global::tl_rng));
// collection loop
#pragma omp for ordered
for(int j = 0; j < omp_get_num_threads(); j++)
#pragma omp ordered
all.insert(all.end(),
std::make_move_iterator(tmp.begin()),
std::make_move_iterator(tmp.end()));
}
The question
This seems to work but I'm not sure if it's reliable (read: portable). Specifically, if, for example, the second thread is done with its share of the main loop early because its generateElement()
calls happened to return quick, won't it technically be allowed to pick the first iteration of the collecting loop? In my compiler that does not happen and it's always thread 0 doing j = 0
, thread 1 doing j = 1
etc. as intended. Does that follow from the standard or is it allowed to be compiler-specific behaviour?
I could not find much about the ordered
clause in the for
directive except that it is required if the loop contains an ordered
directive inside. Does it always guarantee that the threads will split the loop from the start in increasing thread_num
? Where does it say so in referrable sources? Or do I have to make my "generation" loop ordered
as well – does it actually make difference (performance- or logic-wise) when there's no ordered
directive in it?
Please don't answer by experience, or by how OpenMP would logically be implemented. I'd like to be backed by the standard.