9

Imagine we are doing a number of processes where i want to set one overall seed at the beginning of a program: e.g.

mylist <- list( as.list(rep(NA,3)), as.list(rep(NA,3)) )
foo <- function(x){  for(i in 1:length(x)){ 
                       x[[i]] <- sample(100,1)
                         }
                      return(x) 
                     } 

# start block
set.seed(1)
l1 <- lapply(mylist, foo)
l2 <- lapply(mylist, foo)
# end

of course within a block l1 and l2 are different, but if i run the above block again l1 will be the same as before and l2 will be the same as before.

Imagine foo is horribly time consuming so i want to use mclapply not lapply, so i do:

library(parallel)

# start block
set.seed(1)
mclapply(mylist , foo,  mc.cores = 3)
mclapply(mylist , foo,  mc.cores = 3)
# end

If i run this block again I will get different results the next time. How do I produce the same behaviour as with setting one overall seed with lapply but using mclappy. I have looked through mclapply doc but I am not sure because using:

set.seed(1)
l1 <-  mclapply(mylist , foo,  mc.cores = 3, mc.set.seed=FALSE)
l2 <-  mclapply(mylist , foo,  mc.cores = 3, mc.set.seed=FALSE)

result in l1 and l2 being the same, which is not what I want...

user1320502
  • 2,510
  • 5
  • 28
  • 46
  • You can use `clusterSetupRNG` ... http://stackoverflow.com/questions/8358098/how-to-set-seed-for-random-simulations-with-foreach-and-domc-packages – user20650 May 26 '15 at 11:49
  • cheers but the library(doRNG) example appears to be dated and no longer functional and the clusterSetupRNG is not really quite what i requested, unless you can show me otherwise. – user1320502 May 26 '15 at 13:40
  • Seems to have changed slightly... have a look at pg three of the [`doRNG reference manual`](http://cran.r-project.org/web/packages/doRNG/index.html). Or use `snow` – user20650 May 26 '15 at 13:40
  • Actually the help page for `?'%dorng%'` gives an example – user20650 May 26 '15 at 13:46

1 Answers1

10

The parallel package comes with special support for the "L'Ecuyer-CMRG" random number generator which was introduced at the same time as parallel. You can read the documentation for that support using:

library(parallel)
?mc.reset.stream

To use it, you first need to enable "L'Ecuyer-CMRG":

RNGkind("L'Ecuyer-CMRG")

After doing that, code such as:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
mclapply(mylist, foo, mc.cores=3)

will be reproducible, but the two calls to mclapply will return identical results. This is because the state of the random number generator in the master process isn't changed by calling mclapply.

I've used the following function to skip over the random number streams used by the mclapply workers:

skip.streams <- function(n) {
  x <- .Random.seed
  for (i in seq_len(n))
    x <- nextRNGStream(x)
  assign('.Random.seed', x, pos=.GlobalEnv)
}

You can use this function to get the behavior that I think you want:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
Steve Weston
  • 19,197
  • 4
  • 59
  • 75