13

I'm using sequential seeds (1,2,3,4,...) for generation of random numbers in a simulation. Does the fact that the seeds are near each other make the generated pseudo-random numbers similar as well?

I think it doesn't change anything, but I'm using python

Edit: I have done some tests and the numbers don't look similar. But I'm afraid that the similarity cannot be noticed just by looking at the numbers. Is there any theoretical feature of random number generation that guarantees that different seeds give completely independent pseudo-random numbers?

Homero Esmeraldo
  • 1,864
  • 2
  • 18
  • 34
  • 2
    What you can do if you don't trust the RNG (a bit of a hack, I admit) is pass the seed through the SHA1 algorithm from `hashlib`; that's designed to map similar values to completely distinct ones. – Fred Foo Jun 05 '12 at 18:56
  • I've noticed that effect but I think it was in Microsoft C++, not Python. I believe the `random` module uses better algorithms. – Mark Ransom Jun 05 '12 at 20:28
  • https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/ this is an interesting post that I found recently :-) – Homero Esmeraldo Jan 17 '19 at 01:52

6 Answers6

4

There will definitely be a correlation between the seed and the random numbers generated, by definition. The question is whether the randomization algorithm is sufficient to produce results that seem uncorrelated, and you should study up on the methods for evaluating randomness to answer that question.

You are right to be concerned though. Here are the results from Microsoft's C++ rand function with seed values from 0 to 9:

   38  7719 21238  2437  8855 11797  8365 32285 10450 30612
   41 18467  6334 26500 19169 15724 11478 29358 26962 24464
   45 29216 24198 17795 29484 19650 14590 26431 10705 18316
   48  7196  9294  9091  7031 23577 17702 23503 27217 12168
   51 17945 27159   386 17345 27504 20815 20576 10960  6020
   54 28693 12255 24449 27660 31430 23927 17649 27472 32640
   58  6673 30119 15745  5206  2589 27040 14722 11216 26492
   61 17422 15215  7040 15521  6516 30152 11794 27727 20344
   64 28170   311 31103 25835 10443   497  8867 11471 14195
   68  6151 18175 22398  3382 14369  3609  5940 27982  8047
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 1
    `rand` is notoriously unreliable on some platforms. Python's `random` module uses a Mersenne Twister algorithm, which is considered not good enough for crypto, but a lot better than most implementations of `rand`. – Fred Foo Jun 06 '12 at 08:57
1

If you are worried about sequential seeds, then don't use sequential seeds. Set up a master RNG, with a known seed, and then take successive outputs from that master RNG to seed the various child RNGs as needed.

Because you know the initial seed for the master RNG, the whole simulation can be run again, exactly as before, if required.

masterSeed <- 42
masterRNG <- new Random(masterSeed)

childRNGs[] <- array of child RNGs

foreach childRNG in childRNGs
   childRNG.setSeed(masterRNG.next())
endforeach
rossum
  • 15,344
  • 1
  • 24
  • 38
  • 2
    I'm not sure this is such a good idea, because you run the risk of duplicating seeds. For example, that would be very likely if you're generating 16-bit numbers and running thousands of processes. – Will May 09 '17 at 18:05
  • 2
    The question asks about sequential seeds, not duplicate seeds. If duplicate seeds are a problem then use a 128-bit block cipher and encrypt the numbers 0, 1, 2, 3, ... 2^128-1. Being a cipher, the numbers are guaranteed not to duplicate for a very long time, until the counter rolls over. A different key will give a different permutation of the numbers. To duplicate the permutation use the same key. – rossum May 09 '17 at 20:13
  • 1
    -1 because of the risk of duplicating seeds. Better include the method that doesn't have the risk of duplication in the answer, with the code, instead of just commenting. – Homero Esmeraldo Jan 17 '19 at 22:39
1

I have found measurable, but small, correlations in random numbers generated from the Mersenne Twister when using sequential seeds for multiple simulations--the results of which are averaged to yield final results. In python on linux, the correlations go away if I use seeds generated by the system random function (non pseudo random numbers) via random.SystemRandom(). I store SystemRandom numbers in files and read them when a seed is needed in a simulation. To generate seeds:

import random
myrandom = random.SystemRandom
x = myrandom.random       # yields a number in [0,1)
dump x out to file...

Then when seeds are needed

import random
read x from file...
newseed = int(x*(2**31))  # produce a 32 bit integer
random.seed(newseed)
nextran = random.random()
nextran = random.random()...
BugFinder
  • 11
  • 1
  • Can you describe how you measured similarity? – Homero Esmeraldo Jul 25 '16 at 17:53
  • I can't get into my particular application, but the result was almost the opposite of similar--nearby random numbers (a few calls apart) were slightly negatively correlated in that if one was less than some small number (~ 0.01), then the next few were less than randomly likely to also be less than that small number. I didn't do extensive testing and may have jumped the gun here, but there was a consistent trend in multiple instances of large numbers of simulations with sequential seeds. – BugFinder Jul 27 '16 at 00:38
0

First: define similarity. Next: code a similarity test. Then: check for similarity.

With only a vague description of similarity it is hard to check for it.

Paddy3118
  • 4,704
  • 27
  • 38
0

What kind of simulation are you doing?

For simulation purposes your argument is valid (depending on the type of simulation) but if you implement it in an environment other than simulation, then it could be easily hacked if it requires that there are security concerns of the environment based on the generated random numbers.

If you are simulating the outcome of a machine whether it is harmful to society or not then the outcome of your results will not be acceptable. It requires maximum randomness in every way possible and I would never trust your reasoning.

Subs
  • 529
  • 2
  • 9
0

To quote the documentation from the random module:

General notes on the underlying Mersenne Twister core generator:

  • The period is 2**19937-1.
  • It is one of the most extensively tested generators in existence.

I'd be more worried about my code being broken than my RNG not being random enough. In general, your gut feelings about randomness are going to be wrong - the Human mind is really good at finding patterns, even if they don't exist.

As long as you know your results aren't going to be 'secure' due to your lack of random seeding, you should be fine.

Sean McSomething
  • 6,376
  • 2
  • 23
  • 28
  • sorry, what do you mean with this: "your results aren't going to be 'secure' due to your lack of random seeding"? What do you mean by 'secure'? And are you indeed saying that it is not secure because of my sequential seeding? It seems contraditory with what you said before about the random module being trustworthy... – Homero Esmeraldo Jun 05 '12 at 22:21
  • @Homero : I mean that, if you're using this randomness for any sort of security purposes, you're vulnerable to anyone that knows your seeding method. If, OTOH, you're running some sort of simulation and just want reproducible results, you should be getting 'enough' randomness. – Sean McSomething Jun 14 '12 at 23:57
  • 1
    -1 because the question was about how a sequence of correlated seeds affects the quality of the random numbers produced, not about the quality of the random numbers when based on a single seed. This question is relevant, for example, when creating a bunch of different generators (maybe one per thread) at the start of a multithreaded program. You want to avoid correlated results among the generators. I think @rossum's answer is a good way to do this. – Tyler Streeter Feb 07 '13 at 22:34