I'm using the parallel
package and mclapply()
to parallel process simulations in R, using R Programming for Data Science (Chapter 22, Section 22.4.1) as a reference.
I'm setting the seed as instructed, however, when I change the number of cores used in the mclapply()
function, I get different results even with the same seed.
A simple reprex:
# USING 2 CORES
library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(1:100, function(i) {rnorm(1)}, mc.cores = 2)
y <- do.call(rbind, x)
z <- mean(y)
print(mean(z))
# returns 0.143
# USING 3 CORES
library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(1:100, function(i) {rnorm(1)}, mc.cores = 3)
y <- do.call(rbind, x)
z <- mean(y)
print(mean(z))
# returns 0.035
How can I set the seed such that changing the number of cores used doesn't change the result? I feel like this should be a fairly simple thing to do - maintaining reproducibility irrespective of number of cores used.