1

I want to simulate Bernoulli process. I drop a coin N times by

initRand();
p = 0.5;
for ( int i=0; i<N; i++) {
  x = rand(); 
  if ( x < p ) success();
  else failure();
}

Now two scenarios:

(i) At this point I continue to drop coin till 2*N:

for (; i<2*N; i++) {
  x = rand(); 
  if ( x < p ) success();
  else failure();
}

(ii) here I restart random sequence and continue to drop till 2*N:

initRand();
for (; i<2*N; i++) {
      x = rand(); 
      if ( x < p ) success();
      else failure();
}

In the first scenario, the probability of k successes over 2*N tosses is calculated as

P(success)=nchoosek(2*N,k)*p^k*(1-p)^(2*N-k)

Is the same correct for the second scenario? Or due to generator reset, we cannot think of the 2*N cycles as a single process?

rlib
  • 7,444
  • 3
  • 32
  • 40

2 Answers2

2

In general, the answer depends on the Pseudo-Random Number Generator (PRNG) algorithm used and how initRand is implemented.

PRNGs are designed to produce a sequence of values that statistically mimic being independent and identically distributed. How well they succeed varies a great deal. All PRNGs maintain some internal state, which gets updated algorithmically to produce the next value. Seeding a generator means picking the initial state. Matlab's default generator is Mersenne Twister (mt19937), which is pretty good as such things go. If you charge ahead without resetting, your Bernoulli trials will appear to be independent.

That brings us to the question of initRand. Since that's not a Matlab builtin, I have no idea how the one you're using is implemented. If it sets the PRNG to the same state every time you call it, then your two sequences will end up being perfectly correlated with each other. If it picks an arbitrary seed state based on local entropy, it's still possible to have some overlap of the sequences produced, and the results will be partially correlated. The good news is that with a state space of size 219937-1, the chance of seeing this happen in Mersenne Twister is unbelievably low. However, if it chooses a seed based on time and your program runs fast enough, there's a chance that the two sequences could be seeded in the same tick of the clock and would end up being identical.

When all is said and done, your safest bet is to not reset the state with initRand in midstream.

pjs
  • 18,696
  • 4
  • 27
  • 56
  • I generate random strings of 5 alpha chars (lower case only) and i don't want this random strings to collide with a known string, say "barbi". Colliding "success" has probability p=1/26^5 (26 english letters). Each string generation is a bernoulli trial, thus number of collisions is distributed by Binomial distribution with mean equal to N*p. Thus, after 10^10 trials I'll get on average 841 collisions. Therefore, I don't want to allow N to get to 10^10 but to stop the "experiment" at N=10^5. This will prevent collisions. Will reseeding random generator stop the experiment? – rlib Nov 02 '15 at 12:08
  • Speaking bluntly, reseeding a random number is a very bad idea unless you realio trulio know how things work and what you're doing. Legitimate reasons for reseeding include repeatability and to intentionally induce correlation structure between pairs of observations. Attempts to make things "more random" or "more independent" are misguided, and will often backfire. – pjs Nov 02 '15 at 15:57
0

All numbers generated by an Algorithm will only be pseudo-random. Resetting the Algorithm therefore will put you at the beginning of a predetermined sequence of random numbers.

This means in case (i) you will have a sequence of 2*N numbers which can be seen as an independend random process.

However if you reset the Algorithm in (ii) you will get the same N numbers you drew the first time. The numbers are still from the same random distribution, but the first N numbers correlate to the second N numbers. The probability of succes will therefore be determined by the first N entries and not by all 2*N.

Dennis Klopfer
  • 759
  • 4
  • 17