3

I'm using the parallel package and mclapply() to parallel process simulations in R, using R Programming for Data Science (Chapter 22, Section 22.4.1) as a reference.

I'm setting the seed as instructed, however, when I change the number of cores used in the mclapply() function, I get different results even with the same seed.

A simple reprex:

# USING 2 CORES
library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(1:100, function(i) {rnorm(1)}, mc.cores = 2)
y <- do.call(rbind, x)
z <- mean(y)
print(mean(z))
# returns 0.143

# USING 3 CORES
library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(1:100, function(i) {rnorm(1)}, mc.cores = 3)
y <- do.call(rbind, x)
z <- mean(y)
print(mean(z))
# returns 0.035

How can I set the seed such that changing the number of cores used doesn't change the result? I feel like this should be a fairly simple thing to do - maintaining reproducibility irrespective of number of cores used.

bob
  • 610
  • 5
  • 23
  • 2
    Handling this is covered in [R Bloggers](https://www.r-bloggers.com/2018/07/%F0%9F%8C%B1-setting-a-seed-in-r-when-using-parallel-simulation/) – G5W May 23 '21 at 17:39
  • Hi @G5W - getting different results using different numbers of cores, even though a seed is set, is not an issue that seems to be covered in that article. Please correct me if I am wrong, but I do not see this answer touched on there. [Note that I do want to be able to set different seeds - e.g. I want to see the difference that changing seed will have. The article doesn't set a user-specified seed at all] – bob May 23 '21 at 23:09
  • You are using `set.seed`, which is only for the serial part of your code. Search down to the part about "Seeds for parallel" – G5W May 24 '21 at 11:26
  • @G5W thanks - my main question after seeing that is, how do I change the seed that is set? I don't see how to play around with different seeds (e.g. if I wanted to change the seed to see if my method remains stable, previously, I would just set a new seed with `set.seed()`. But how do I set a new seed with this method? – bob May 24 '21 at 23:07
  • 1
    The **future** framework gives reproducible RNG regardless of the number of parallel workers. To parallelize using _forked_ processes just like `mclapply()`, do `library(future.apply); plan(multicore, workers = 2); set.seed(1); y <- future_lapply(1:100, function(i) { rnorm(1) }, future.seed = TRUE)`. Retry with `plan(multicore, workers = 3)` and you'll see you get identical results. This is true regardless which type of parallel backend and number of workers you use. – HenrikB May 26 '21 at 01:41

0 Answers0