repeat random generation in parallel in c++, avoiding duplication over multiple runs

Question

This is probably a really simple question but I'm pretty new to random number generation on c++ and want to make sure I'm getting it correct.

I have a stochastic function that I want to run in parallel, multiple times, so each parallel run of the function needs to be different from the other and different from previous runs, from my understanding, one way I could do this would be to have random_device as the seed for each. e.g.

for (int i= 0; i< runs; i++){
//do something
#pragma omp parallel for schedule(static)
for (int j = 0; j < std::size(iters); j++){
    std::mt19937 mrandThread(std::random_device{}());
    iters.at(j) = stochFunction(parameters, mrandThread);
}
//do something
}

However, this seems like it would be computationally expensive, as you would be initiating random_device many times, especially if the above loop is repeated a lot. Another problem would be that runs may be duplicated, as the random_device is just setting a seed, which may come up again? However, currently passing the engine from outside the loops e.g.

std::mt19937 mrandThread(std::random_device{}());
for (int i= 0; i< runs; i++){
//do something
#pragma omp parallel for schedule(static)
for (int j = 0; j < std::size(iters); j++){
    iters.at(j) = stochFunction(parameters, mrandThread);
}
//do something
}

means that each thread gives the same result, as they are just running duplicates of the mersenne twister from the point they were sent out in parallel? Another option I've seen is to use rand_r(), but could this again potentially have a problem similar to seed duplication, or does this act more like a branching off of the current random trajectory set outside the loop?

Any advice on how best to implement this would be much appreciated.

You could simply assign each thread it's own instance of the random generator (with a different seed). That way you only assign the seed times and not for each iteration. — Qubit, Jun 22 '18 at 12:13

score 0 · Answer 1 · answered Jun 22 '18 at 12:22

0

I think something along the lines of #pragma omp parallel private(mrandThread) will do the trick, it means:

The private directive declares data to have a separate copy in the memory of each thread. Such private variables are initialized as they would be in a main program. Any computed value goes away at the end of the parallel region. (However, see below.) Thus, you should not rely on any initial value, or on the value of the outer variable after the region.

(from: http://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-data.html)

answered Jun 22 '18 at 12:22

virgesmith

762
1
7
18

Should be careful though, the default seed is a fixed value (if I read the documentation right), you don't want to have several identical generators - that would defeat the purpose of running this in parallel. – Qubit Jun 22 '18 at 12:34
My understanding is `std::random_device` *should* take care of this, but perhaps not on all platforms. See https://en.cppreference.com/w/cpp/numeric/random/random_device. Definitiely worth double-checking – virgesmith Jun 22 '18 at 12:49
I don't know, I understand it as, initialise once outside the loop, then create a copy for each thread - not what you want. But I might be wrong, it should be trivial to check. – Qubit Jun 22 '18 at 12:59
Thanks for your help both, I'm going through the documentation and seeing how it works etc. Initializing outside the loop seems to duplicate the generators, leading to identical runs, but I may have missed something. – ALs Jun 22 '18 at 13:03
As it should, just create a parallel region where each thread gets a private generator and then iterates over the cases, or you can be really lazy and create an array/vector of generators, there's some overhead here but whether or not you should care about that depends on the complexity of your `stochFunction`. – Qubit Jun 22 '18 at 13:13
interesting how omp seems to create one instance then copy to each thread...what happens if copy/assign disabled (deliberately, as would be sensible for a RNG) – virgesmith Jun 22 '18 at 14:17
A quick test indicates that this does not produce the desired result. – Jerry Coffin Jun 22 '18 at 15:00

Jerry Coffin · Answer 2 · 2018-06-22T17:58:41.977

If you want to assure a unique seed for each thread, I'd probably generate the seeds ahead of time, then use them as you start up the threads:

std::vector<int> gen_seeds(int num) {
    // start by stuffing them into a set to guarantee uniqueness.
    std::set<int> s;

    std::random_device g;

    while (s.size() < num)
        s.insert(g());

    // then return them in a vector to give random access:
    std::vector<int> seeds(s.begin(), s.end());
    return seeds;
}

// generate N unique seeds:
auto seeds = gen_seeds(std::size(iters));

#pragma omp parallel for schedule(static)
for (int j = 0; j < std::size(iters); j++){
    // and use each to seed a generator:
    std::mt19937 mrandThread(seeds[j]);
    iters.at(j) = stochFunction(parameters, mrandThread);
}

In theory, calling random_device in each thread might be a tiny bit faster, but it's honestly unlikely--it'll normally be a hardware device with serialized access in any case. By doing it ahead of time in a single thread (and using a set, or similar) it's trivial to assure that each seed is truly unique, which is probably more important than saving a microsecond on generating the seeds anyway.

Oh: one other point: using a single result from random_device to seed a Mersenne twister isn't entirely optimal either. The generator classes can tell you the size of their seed data--ideally, you want to seed them with that much data.

@SeverinPappadeux: At least as it stands, you're not--but the question doesn't mention any desire to prevent that from happening. If you really did want to prevent it from happening, you'd probably want to start by generating all the inputs as a single stream, then spawn the threads, passing each its own part of that single stream. — Jerry Coffin, Jun 22 '18 at 17:55
`but the question doesn't mention any desire to prevent that from happening` Really?!? Question was `so each parallel run of the function needs to be different from the other and different from previous runs`, how it could be different if sequences are overlapping? — Severin Pappadeux, Jun 22 '18 at 18:44

repeat random generation in parallel in c++, avoiding duplication over multiple runs

2 Answers2