0

There has been a large number of parallel RNG questions here, but I couldn't find one which addresses my variant.

I'm writing a function that given a seed fills a long array with random numbers based on that seed. I currently do that serially, but I find that the RNG is taking a significant amount of the running time of my program. I therefore want to speed up my function by using multiple threads. But I want this to be transparent to the user. That is, given a seed, one should get the same random number sequence out independent of the number of threads the function uses internally.

My current idea for doing this is to divide the array into chunks (independently of the number of threads), and generate a new RNG for every chunk, for example by seeding each RNG with the seed+chunk_id. The chunks could then be processed independently, and it would not matter which thread processes which chunk. But I am worried that this might reduce the quality of the RNG. Is this a safe way of doing this for a high-quality RNG like mersenne twister?

To illustrate, here is some pseudo-code for the process:

function random(array, seed, blocksize=100000)
  for each block of size blocksize in array
    rng[block] = create_rng(seed+i)
  parallel for each block in array
    for each sample in block
      array[sample] = call_rng(rng[block])

This should produce the same values for each (seed,blocksize) combination. But is this the best way of doing this?

amaurea
  • 4,950
  • 26
  • 35
  • I would suggest instead using one RNG just to generate the seeds for the other RNGs. You will have to know a bit about the internals of the RNG to ensure your use doesn't conflict with its logic. – David Schwartz Apr 05 '14 at 11:15
  • Why would that be better? The numbers 1,2,3.. should be far away from each other in the state space of the RNG, shouldn't they? – amaurea Apr 05 '14 at 11:17
  • That's equivalent to saying that the RNG would not be weakened by replacing it with an algorithm that seeds it with successive integers and just generates one output from each one. I doubt that's true. (But if you know for a fact the RNG has that property, then go for it.) – David Schwartz Apr 05 '14 at 11:18
  • No, that's not what I was trying to say. I asked this question because I was worried about this issue. What I wondered about was why seeding them using another RNG would be better. There are some simple cases where it would demonstrably be worse, such as for a simple rng that just uses the last returned value as its state. In that case, you would end up with each sub-rng producing the same sequence as it siblings, just with different offsets. I understand that that's why you said one needed to take the internals of the RNG into account, but that seems to be begging the question. – amaurea Apr 05 '14 at 11:23
  • If the RNG is good, then the sequence of outputs it produces will be good and random. Thus they'll be good seeds -- that's what you want. I don't see how that argument applies for seeds that differ by one. (If the RNG has the property that its next output can be determined from just its previous output, it's a piece of junk (in this context, that's not to say such things are never useful).) It's reasonable for an RNG to require a random seed. – David Schwartz Apr 05 '14 at 11:24

1 Answers1

1

I tested the effective RNG quality of this approach using the TestU01 random number generator test suite by constructing a custom RNG which is reseeded with a new sequential seed every 0x1000 steps:

#include <stdlib.h>
#include "ulcg.h"
#include "unif01.h"
#include "bbattery.h"

long long i=1,j=0;
unif01_Gen * gen;

unsigned long myrand()
{
  if(++i&0xfff==0)
  {
    ugfsr_DeleteGen(gen);
    gen = ugfsr_CreateMT19937_02(++j, NULL, 0);
  }
  return gen->GetBits(gen->param, gen->state);
}

int main()
{
  unif01_Gen *gen2 = unif01_CreateExternGenBitsL("foo", myrand);
  gen = ugfsr_CreateMT19937_02(1, NULL, 0);
  bbattery_Crush (gen2);
  return 0;
}

Result (after waiting 40 minutes for the tests to complete):

      Test                          p-value
----------------------------------------------
71  LinearComp, r = 0              1 - eps1
72  LinearComp, r = 29             1 - eps1
----------------------------------------------
All other tests were passed

These are the same tests Mersenne Twister fails even when used normally, when not reseeding. So the TestU01 Crush test could not distinguish the sequential seeding scenario from normal usage.

I also tested the approach of reseeding with the output from another Mersenne Twister instead of using sequential integers. The result was exactly the same.

While I did not try the most time-consuming "BigCrush" test (which takes 8 hours), I think it's safe to say that the quality of MT is not significantly impaired by generating sub-RNGs with sequential seeds, as suggested in the question.

amaurea
  • 4,950
  • 26
  • 35