8

Ruby implements PRNGs as "a modified Mersenne Twister with a period of 2**19937-1." 1

The way I understand MT is that it operates on 2^32 different seeds. What confuses me is that Random.new(seed) accepts arbitrarily big numbers such as Random.new(2**100).

However, I wasn't able to find (logical) collisions:

Random.new(1).rand(10**5) == Random.new(2**32-1).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32+1).rand(10**5) => false

Given that we'd like to utilize MT's maximum seed range in the sense that we want to use as many different seeds as possible while still avoiding collisions with two different seeds, what seed range achieves this?

I tried understanding what is happening inside the Ruby's random implementation, but didn't get too far. https://github.com/ruby/ruby/blob/c5e08b764eb342538884b383f0e6428b6faf214b/random.c#L370

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
randomguy
  • 12,042
  • 16
  • 71
  • 101
  • 2
    It uses a vector of 624 32-bit integers internally (I *think* - at least that is what default implementations of MT would use). The code you linked splits up the big integer into an array of 32-bit integers that feed the initial state vector. – Neil Slater Nov 19 '13 at 16:21
  • Note 624 * 32 = 19968 . . . the "seed" is also the "state" for MT – Neil Slater Nov 19 '13 at 16:29
  • @NeilSlater: So wait. Does this mean that Random.new(1) will at some point start generating the same sequence as Random.new(1000)? – randomguy Nov 19 '13 at 16:35
  • I wonder if `rb_to_int(vseed);` does some normalization? – randomguy Nov 19 '13 at 16:42
  • 3
    Yes, there is a single sequence which repeats, the seeds just pick up at a different position. However, the states are not generally as close together as small seeds. It is very unlikely you would see a collision in practice - if you just set the generator running from `srand(1)` vs `srand(1000)` and got a billion results per second, we'd all be long dead by the time there was an overlap between the sequences. The available space is huge. This is different issue to knowing "where am I in the sequence", which is about seeing enough variation to *identify* the state. – Neil Slater Nov 19 '13 at 16:44
  • Neil Slater: Ah, that makes sense. Just to crystallize: With Ruby's implementation, we can choose seed range to be up to a 19968 bit number with the expense of the unique sequence length? – randomguy Nov 19 '13 at 16:52
  • I think that's close. Having a look myself, but not able to generate repeat sequences yet, which would demonstrate it with an answer to your question. – Neil Slater Nov 19 '13 at 17:02
  • Seems like too much magic. I might just use a simple c/c++ MT implementation and interface with that from Ruby, just to be on the safe side. – randomguy Nov 19 '13 at 17:04

1 Answers1

11

The Mersenne Twister sequence is 2 ** ( 624 * 32 - 1 ) - 1 long, and the seed value is used to set an internal state for the PRNG that directly relates to the position within that sequence.

The easiest-to-find repeat appears to be every 2 ** ( 624 * 32 ), and can be shown to work like this:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 r17 = Random.new( start_value + 17 * repeat_every )

 r23 = Random.new( start_value + 23 * repeat_every )

 r1.rand == r2.rand  
 # true

 r17.rand == r23.rand  
 # true

Or try this:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 Array.new(10) { r1.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

 Array.new(10) { r2.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

The repeat value relates to how Mersenne Twister works. The internal state of MT is an array of 624 32-bit unsigned integers. The Ruby source code you linked packs a Ruby Fixnum into an array - the magic command is

  rb_integer_pack( seed, buf, len, sizeof(uint32_t), 0,
        INTEGER_PACK_LSWORD_FIRST|INTEGER_PACK_NATIVE_BYTE_ORDER );

however, this isn't something easy to play with, it is defined in internal.h, so only really accessible if you work on Ruby interpreter itself. You cannot access this function from within a normal C extension.

The packed integer is then loaded to the MT's internal state by the function init_by_array. This is quite a complex-looking function - the packed seed value is not written literally into the state, but instead the state is generated item by item, adding in the supplied array values, using a variety of xors, additions and cross-referencing the previous value (the Ruby source here also adds in the packed array's index position, commented "non-linear", I think that is one of the referenced modifications to standard MT)

Note that the size of the MT sequence is smaller than 2 ** ( 624 * 32 ) - the repeat_every value I show here is skipping over 2 sequences at a time, but it is the easiest to find repeating seed value, because it is easy to see how it sets the internal state exactly the same (because the first 624 items in the array representation of the seed are identical, and that is all that gets used later). Other seed values will also produce the same internal state, but the relationship is a complex mapping that pairs up each integer in the 19938-bit space with another integer which creates the same state for MT.

Neil Slater
  • 26,512
  • 6
  • 76
  • 94