0

I need to generate many hundreds of millions of random numbers for a clustering analysis. I am using numpy.random and was wondering if anyone knows the maximum number of pseudo-randoms that can be generated with numpy.random before the sequence begins to repeat? A quick look in the numpy documentation didn't help.

I know I can generate numbers in chunks using different seeds, but I'm curious as to the maximum number.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
aim
  • 657
  • 2
  • 12
  • 26
  • Do you need _unique_ random numbers ? – Patrick Artner Nov 05 '18 at 23:38
  • `numpy.random` provides many functions. What does your actual RNG look like. – wim Nov 05 '18 at 23:41
  • What kind of errors are significant for your clustering analysis? – Mitch Wheat Nov 05 '18 at 23:50
  • @PatrickArtner -- yes, I need all numbers to be unique. No repeats. – aim Nov 05 '18 at 23:52
  • @wim -- I'm not sure what you mean. I was simply using the default numpy.random.rand(N) set-up. Is that – aim Nov 05 '18 at 23:53
  • " I need all numbers to be unique. " - then simply relying on a random number sequence will not be sufficient. – Mitch Wheat Nov 05 '18 at 23:56
  • @MitchWheat -- so what would you recommend? – aim Nov 05 '18 at 23:56
  • @aim If you want all numbers to be unique, then the answer is pretty much independent on actual RNG used - f.e. it would be 2^64 max for 64bit numbers produced and used, and then, obviously one of the previously sampled number will be repeated. If we're talking about whole SEQUENCE of random numbers, then it would be what I wrote in my answer. – Severin Pappadeux Nov 06 '18 at 00:16
  • Thanks all -- very helpful and informative answers/comments. – aim Nov 06 '18 at 01:27

1 Answers1

3

It is, I believe, Mersenne Twister with period 219937-1

https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.set_state.html

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64