Parallel processing in R using parallel package - not reproducible with different number of cores

Question

I'm using the parallel package and mclapply() to parallel process simulations in R, using R Programming for Data Science (Chapter 22, Section 22.4.1) as a reference.

I'm setting the seed as instructed, however, when I change the number of cores used in the mclapply() function, I get different results even with the same seed.

A simple reprex:

# USING 2 CORES
library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(1:100, function(i) {rnorm(1)}, mc.cores = 2)
y <- do.call(rbind, x)
z <- mean(y)
print(mean(z))
# returns 0.143

# USING 3 CORES
library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(1:100, function(i) {rnorm(1)}, mc.cores = 3)
y <- do.call(rbind, x)
z <- mean(y)
print(mean(z))
# returns 0.035

How can I set the seed such that changing the number of cores used doesn't change the result? I feel like this should be a fairly simple thing to do - maintaining reproducibility irrespective of number of cores used.

Handling this is covered in [R Bloggers](https://www.r-bloggers.com/2018/07/%F0%9F%8C%B1-setting-a-seed-in-r-when-using-parallel-simulation/) — G5W, May 23 '21 at 17:39
Hi @G5W - getting different results using different numbers of cores, even though a seed is set, is not an issue that seems to be covered in that article. Please correct me if I am wrong, but I do not see this answer touched on there. [Note that I do want to be able to set different seeds - e.g. I want to see the difference that changing seed will have. The article doesn't set a user-specified seed at all] — bob, May 23 '21 at 23:09
You are using `set.seed`, which is only for the serial part of your code. Search down to the part about "Seeds for parallel" — G5W, May 24 '21 at 11:26
@G5W thanks - my main question after seeing that is, how do I change the seed that is set? I don't see how to play around with different seeds (e.g. if I wanted to change the seed to see if my method remains stable, previously, I would just set a new seed with `set.seed()`. But how do I set a new seed with this method? — bob, May 24 '21 at 23:07
The **future** framework gives reproducible RNG regardless of the number of parallel workers. To parallelize using _forked_ processes just like `mclapply()`, do `library(future.apply); plan(multicore, workers = 2); set.seed(1); y <- future_lapply(1:100, function(i) { rnorm(1) }, future.seed = TRUE)`. Retry with `plan(multicore, workers = 3)` and you'll see you get identical results. This is true regardless which type of parallel backend and number of workers you use. — HenrikB, May 26 '21 at 01:41

Parallel processing in R using parallel package - not reproducible with different number of cores

0 Answers0