Pseudo-random number generator for cluster environment

Question

How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (e.g. 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node.

I could generate all PRN on root node, then send them to other nodes. But it would be far too slow.
I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?
I could use different generators on each node. But is it possible with good generators like Mersenne-Twister? How could it be done?
Any other though?

Related: http://cstheory.stackexchange.com/questions/6903/parallel-pseudorandom-number-generators — Jukka Suomela, Jun 16 '11 at 22:31
@Jukka Suomela: maybe you should post your own answer on cstheory mentioning that we have now solved this problem here — jopasserat, Jun 17 '11 at 08:08

jopasserat · Accepted Answer · 2011-06-16T07:30:16.967

11

You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality.

Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams.

Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences.

You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change.

In addition of the README coming along with DC, take a look at the file example/new_example2.c in the DC archive. It shows example of calls to get independent sequences given a different input identifier, which is basically what you have to identify cluster jobs.

Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles:

Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010

edited Jun 16 '11 at 07:30

answered Jun 16 '11 at 07:13

jopasserat

5,721
4
31
50

1

DCMT is what I actually use. It uses the node number to create the generator polynomial, so every random sequence is independent. But is there any proof of that? What I remember from the original article on DCMT, is that they didn't proved it works, they just supposed it should work. – Charles Brunet Jun 16 '11 at 12:22
2

I wish there were a mathematical proof. Unfortunately, there isn't! I'm doing my PhD on stochastic simulations in High Performance environments, so I have widely studied this problem. Basically, if you do not have memory constraints (MT's family state vectors are quite large), this approach is the actual best to ensure high independence between your sequences. Other approach are surveyed in the piper I cited above, but the author champions DCMT's parameterization approach. – jopasserat Jun 16 '11 at 12:42
By the way, what are the best practices to follow if I want to pre-compute the generators? Should I run `get_mt_parameter_id_st` and save all parts of the `mt_struct` in a file, *excluding* the `state` vector; then later load the struct from a file; and finally initialise the state vector with `sgenrand_mt`? Can I perhaps omit some other parts of `mt_struct`? – Jukka Suomela Jun 17 '11 at 00:20
For independence purposes, you should exactly do so: pre-compute the generators and save all parts of the mt_struct data structure in a file. This way you have highly independent sequences ready to use. Then, you can also save the state vectors. Depending on the state vector, you start at a different point in the sequence, but some parts may be of better quality than others (typically, older versions of MT had problems when their initial state were full of zeros). – jopasserat Jun 17 '11 at 06:02
What about using the more recent [MTGP](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/) instead of DCMT? It seems that the MTGP distribution comes with 128 pre-computed parameter sets, and it also comes with a DC that can be used to construct more, if necessary. – Jukka Suomela Jun 17 '11 at 15:45
You can do so, I think there is 200 pre-computed parameters sets with MTGP. In my lab we have pre-computed and verified 10000 parameters sets for MTGP 2^3217, we should soon be able to provide them freely on the web soon. – jopasserat Jun 17 '11 at 17:28
1

[TinyMT](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html) might be a good alternative. – jj1bdx Mar 16 '12 at 16:03
Is there a 64-bit version? I need 1000 independent 64-bit MT generators? – max Jul 13 '16 at 17:01
@max something like that? http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html – jopasserat Jul 18 '16 at 10:18
I meant, there is 64-bit MT, but I could not find 64-bit MT with DC. – max Jul 18 '16 at 23:04
1

the first parameter of the `get_mt_parameter` function in DC allows you to specify the word size – jopasserat Jul 19 '16 at 14:57

score 1 · Answer 2 · answered Mar 01 '13 at 03:08

I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?

Yes, see http://theo.phys.sci.hiroshima-u.ac.jp/~ishikawa/PRNG/mt_stream_en.html. This is a excellent solution to obtaining independent random number streams. By making jumps that are larger than the number of random numbers needed from each stream to create the starts of each stream, the streams won't overlap.

score 1 · Answer 3 · 2011-06-16T06:16:22.270

Okay, answer #2 ;-)

I am going to say ... keep it simple. Just use a "short" seed to prime the MT (imagine that this seed is 2³² bits for lack of better restriction). This assumes that the the short seed generates "sufficiently distributed" MT starting states (e.g. init_genrand in the code in my other answer, hopefully). This doesn't guarantee an equally distributed starting state but rather goes for "good enough", see below.

Each node will use it's own sequence of seeds which are pre-selected (either a list of random seeds which is transmitted or a formula like number_nodes * node_number * iteration). The important thing is that the initial "short" seed will never be re-used across nodes.

Each node will then use a MT PRNG initialized with this seed n times where n <<< MT_period / max_value_of_short_seed (TT800 is 2⁸⁰⁰-1 and MT19937 is 2¹⁹⁹³⁷-1, so n can still be a very large number). After n times, the node moves onto the next seed in the chosen list.

While I do not (nor can I) provide a "guarantee" that no node will ever have a duplicate sequence at the same time (or at all), here is what AMD says about Using Different Seends: (Obviously the initial seeding algorithm plays a role).

Of the four methods for creating multiple streams described here, this is the least satisfactory ... For example, sequences generated from different starting points may overlap if the initial values are not far enough apart. The potential for overlapping sequences is reduced if the period of the generator being used is large. Although there is no guarantee of the independence of the sequences, due to its extremely large period, using the Mersenne Twister with random starting values is unlikely to lead to problems, especially if the number of sequences required is small ...

Happy coding.

The four solutions proposed by AMD are true only in their library case. But when you're dealing with the original MT implementation, like @Charles Brunet does, parameterization through Dynamic Creator is the most appropriate way to handle independent Mersenne Twister streams. — jopasserat, Jun 16 '11 at 16:00

score 0 · Answer 4 · answered Aug 10 '13 at 17:38

TRNG is a random number generator built specifically with parallel cluster environments in mind (specifically it was built for the TINA super computer in Germany). Hence it is very eas to create independent random number streams and also generate non standard distributions. There is a tutorial on how to set it up here: http://www.lindonslog.com/programming/parallel-random-number-generation-trng/

score 0 · Answer 5 · 2011-06-16T05:33:32.787

Disclaimer: I am not sure what guarantee MT has in terms of cycle overlap when starting from an arbitrary "uint" (or x, where x is a smaller arbitrary but unique value) seed, but that may be worth looking into, as if there is a guarantee then it may be sufficient to just start each node on a different "uint" seed and the rest of this post becomes largely moot. (The cycle length/period of MT is staggering and dividing out UINT_MAX still leaves an incomprehensible -- except on paper -- number.)

Well, here goes my comments to answer...

I like approach #2 with a pre-generated set of states; the MT in each node is then initialized with a given starting state.

Only the initial states must be preserved, of course, and once this is generated these states can

Be re-used indefinitely, if requirements are met, or;
The next states can generated forward on an external fast box why the simulation is running or;
The nodes can report back the end-state (if reliable messaging, and if sequence is used at same rate among nodes, and meets requirements, etc)

Considering that MT is fast to generate, I would not recommend #3 from above as it's just complicated and has a number of strings attached. Option #1 is simple, but might not be dynamic enough.

Option #2 seems like a very good possibility. The server (a "fast machine", not necessarily a node) only needs to transmit the starting state of the next "unused sequence block" (say, one billion cycles) -- the node would use the generator for one billion cycles before asking for a new block. This would make it a hybrid of #1 in the post with very infrequent messaging.

On my system, a Core2 Duo, I can generate one billion random numbers in 17 seconds using the code provided below (it runs in LINQPad). I am not sure what MT variant this is.

void Main()
{
    var mt = new MersenneTwister();
    var start = DateTime.UtcNow;
    var ct = 1000000000;
    int n = 0;
    for (var i = 0; i < ct; i++) {      
        n = mt.genrand_int32();
    }
    var end = DateTime.UtcNow;
    (end - start).TotalSeconds.Dump();
}

// From ... and modified (stripped) to work in LINQPad.
// http://mathnet-numerics.googlecode.com/svn-history/r190/trunk/src/Numerics/Random/MersenneTwister.cs
// See link for license and copyright information.
public class MersenneTwister
{
    private const uint _lower_mask = 0x7fffffff;
    private const int _m = 397;
    private const uint _matrix_a = 0x9908b0df;
    private const int _n = 624;
    private const double _reciprocal = 1.0/4294967295.0;
    private const uint _upper_mask = 0x80000000;
    private static readonly uint[] _mag01 = {0x0U, _matrix_a};
    private readonly uint[] _mt = new uint[624];
    private int mti = _n + 1;

    public MersenneTwister() : this((int) DateTime.Now.Ticks)
    {
    }       
    public MersenneTwister(int seed)
    {
                init_genrand((uint)seed);
    }       

    private void init_genrand(uint s)
    {
        _mt[0] = s & 0xffffffff;
        for (mti = 1; mti < _n; mti++)
        {
            _mt[mti] = (1812433253*(_mt[mti - 1] ^ (_mt[mti - 1] >> 30)) + (uint) mti);
            _mt[mti] &= 0xffffffff;
        }
    }

    public uint genrand_int32()
    {
        uint y;
        if (mti >= _n)
        {
            int kk;

            if (mti == _n + 1) /* if init_genrand() has not been called, */
                init_genrand(5489); /* a default initial seed is used */

            for (kk = 0; kk < _n - _m; kk++)
            {
                y = (_mt[kk] & _upper_mask) | (_mt[kk + 1] & _lower_mask);
                _mt[kk] = _mt[kk + _m] ^ (y >> 1) ^ _mag01[y & 0x1];
            }
            for (; kk < _n - 1; kk++)
            {
                y = (_mt[kk] & _upper_mask) | (_mt[kk + 1] & _lower_mask);
                _mt[kk] = _mt[kk + (_m - _n)] ^ (y >> 1) ^ _mag01[y & 0x1];
            }
            y = (_mt[_n - 1] & _upper_mask) | (_mt[0] & _lower_mask);
            _mt[_n - 1] = _mt[_m - 1] ^ (y >> 1) ^ _mag01[y & 0x1];

            mti = 0;
        }

        y = _mt[mti++];

        /* Tempering */
        y ^= (y >> 11);
        y ^= (y << 7) & 0x9d2c5680;
        y ^= (y << 15) & 0xefc60000;
        y ^= (y >> 18);

        return y;
    }
}

Happy coding.

When you seed MT with a uint, it just generate its internal state using a linear congruent generator. So you you have any idea what is the proximity of two states using two different seeds. When doing Monte Carlo simulations, I'm looking for very rare events. If I need a probability of 1e-12, I need at least 1e14 random numbers. Your solution could work, but is definitively not the best approach. — Charles Brunet, Jun 16 '11 at 12:13

Pseudo-random number generator for cluster environment

5 Answers5

Linked