Your problem is that you only do matrix.list[[i]]
in parallel and [[
is very fast for lists. The .combine
operation is done by the master process after all parallel tasks have been completed.
You should separate your list into chunks like this:
set.seed(42)
n <- 1e3
matrix.list <- replicate(n, matrix(rnorm(1),nrow=1000,ncol=1000), simplify = FALSE)
system.time({
matrix.sum_s <- Reduce("+", matrix.list)
})
#user system elapsed
#1.83 1.25 3.08
library(foreach)
library(doParallel)
ncl <- 4
cl <- makeCluster(ncl)
registerDoParallel(cl)
system.time({
matrix.sum_p <- foreach(x = split(matrix.list, (seq_len(n) - 1) %/% (n/ncl)),
.combine='+') %dopar%
{Reduce("+", x)}
})
#user system elapsed
#6.49 35.97 46.97
stopCluster(cl)
all.equal(matrix.sum_s, matrix.sum_p)
#[1] TRUE
Of course, the parallelized version is still much slower than simply using Reduce
. Why? Because +
is a fast low-level (.Primitive
) function. foreach
spends the time mostly with copying the several GB of dense matrices.