3

Setting a seed ensures reproducibility and is important in simulation modelling. Consider a simple model f() with two variables y1 and y2 of interest. The outputs of these variables are determined by a random process (rbinom()) and the parameters x1 and x2. The outputs of the two variables of interest are independent of each other.

Now say we want to compare the change in the output of a variable after a change in the respective parameter has occurred with a scenario before the change was made (i.e. sensitivity analysis). If all other parameters have not been changed and the same seed was set, shouldn't the output of the unaffected variable remain the same as it is in the default simulation since this variable is independent of the other?

In short, why is the below output of variable y2 determined by parameter x2 changing after only a change in x1 occurs despite constant seed being set? One could just ignore the output of y2 and focus only on y1, but in a larger simulation where each variable is a cost component of the total cost the change in an unaffected variable may become problematic when testing the overall sensitivity of a model after individual parameter changes have been made.

#~ parameters and model

x1 <- 0.0
x2 <- 0.5
n  <- 10
ts <- 5

f <- function(){
  out <- data.frame(step = rep(0, n),
                    space = 1:n,
                    id = 1:n,
                    y1 = rep(1, n),
                    y2 = rep(0, n))
  
  l.out <- vector(mode = "list", length = n)
  
  for(i in 1:ts){
    out$step <- i
    out$y1[out$y1 == 0] <- 1
    out$id[out$y2 == 1]  <- seq_along(which(out$y2 == 1)) + n
    out$y2[out$y2 == 1] <- 0
    
    out$y1 <- rbinom(nrow(out), 1, 1-x1)
    out$y2 <- rbinom(nrow(out), 1, x2)
    
    n  <- max(out$id)
    l.out[[i]] <- out
  }
do.call(rbind, l.out)
}

#~ Simulation 1 (default)
set.seed(1)
run1 <- f()
set.seed(1)
run2 <- f()
run1 == run2 #~ all observations true as expected

#~ Simulation 2
#~ change in x1 parameter affecting only variable y1
x1 <- 0.25
set.seed(1)
run3 <- f()
set.seed(1)
run4 <- f()
run3 == run4 #~ all observations true as expected

#~ compare variables after change in x1 has occured
run1$y1 == run3$y1  #~ observations differ as expected
run1$y2 == run3$y2  #~ observations differ - why?
altfi_SU
  • 584
  • 1
  • 3
  • 15

1 Answers1

4

Great question. The reason for this behaviour is that when you set p = 0 or p = 1 in rbinom, the underlying C function realises it doesn't need to sample using the random number generator. The seed only changes when the random number generator is called, so if p is any number strictly between 0 and 1, the seed will change, but if p is 0 or 1 it won't. You can see this is the source code.

Under normal circumstances when p is more than zero or less than one, your set-up should work fine:

set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 0.1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] TRUE

But if you set p to 1 or 0, the results will be different:

set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] FALSE

To show this is correct, we should get y1 == y2 if we set p to 1 the first time and p to 0 the second time:

set.seed(1)
x1 <- rbinom(5, 1, 0)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] TRUE
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Hi Allan, thanks for your brilliant answer. Now I have a follow-up question to you. Is there a way to ensure that a RNG is always sampled regardless of whether or not `p` is 0 or 1? In some cases I will need for the output of a variable to be either 0 or 100% of the random process - and in my model there are many `rbinom()` calls. – altfi_SU Nov 20 '20 at 12:42
  • @altfi_SU You could add the line `if(x1 == 0 || x1 == 1) { dummy <- rbinom(1, 1, 0.5)}` – Allan Cameron Nov 20 '20 at 13:16
  • Thanks, unfortunately, that doesn't seem to work though. – altfi_SU Nov 20 '20 at 13:31
  • Hi Allan, is there a chance I can have a short discussion with you in private? I have a few questions about RNG and would very much like to discuss them with someone. So far, you are the only one that has reached out. Your help would be greatly appreciated. – altfi_SU Nov 23 '20 at 08:24
  • @altfi_SU I always like to help, but I am no expert at RNG, and I know that you will get a better answer by posting a formal question and letting the whole community have a shot at answering than you will by speaking to me – Allan Cameron Nov 23 '20 at 08:44
  • Yes, posting a question will help, but the problem is that I cannot properly formulate the question I now have and may need to talk my thoughts out with someone with more experience than I. But if you cannot talk, I understand entirely. – altfi_SU Nov 23 '20 at 08:47