1

In “Dynamic Creation of Pseudorandom Number Generators” Matsumoto and Nishimura cautioned against careless initialization of MT PRNGs for the purposes of parallel simulations which assume the number streams from the different generators are independent of each other:

The usual scheme for PRNG in parallel machines is to use one and the same PRNG for every process, with different intial seeds. However, this procedure may yield bad collision, in particular if the generator is based on a linear recurrence, because in this case the sum of two pseudorandom sequences satisfies the same linear recurrence, and may appear in the third sequence. The danger becomes non-negligible if the number of parallel streams becomes large compared to the size of the state space.

How big does the number of streams have to get for this to be a serious problem? In the case of the standard MT, MT19937, the state space is rather large... I could definitely see there being a linear relationship modulo 219937−1 among 20,000 sequences, but how about a birthday-paradox-style relationship among 400 sequences?

It appears that this is actually a serious problem, because parallel PRNG implementations do include DCMT, but it would be nice to have some examples of failure, and some sense for when this becomes a problem.

Alex Coventry
  • 68,681
  • 4
  • 36
  • 40

1 Answers1

1

There was a discussion of this issue here.

Take two streams "far apart" in MT space, and their sum also satisfies the recurrence. So a possible worry about a third stream isn't just about correlation or overlap with the first two streams, but, depending on the app, also about correlation/overlap with the sum of the first two streams. Move to N streams, and there are O(N**2) direct sums to worry about, and then sums of sums, and ...

Still won't cause a problem in my statistical life expectancy, but I only have 4 cores ;-)

So it's worse than birthday-paradox. The problem is actually pretty likely with log(size of state space)/log(2) sequences, which for the standard MT is about 14 sequences.

Alex Coventry
  • 68,681
  • 4
  • 36
  • 40