1

I have a long list of large matrices (or rasters) that I want to sum in parallel. The output should be a matrix. I tried foreach using the .combine='+' function, which works, but it seems .combine only works on one thread so there is no speedup. Any suggestions? Thanks

matrix.list <- list()
for(i in 1:10000) matrix.list[[i]] <- matrix(1,nrow=100,ncol=100)

library(foreach)
library(doMC)
registerDoMC(cores=2)

matrix.sum <- foreach(i=1:10000,.combine='+') %dopar% matrix.list[[i]]
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • exactly. I have 600,000 matrices of size 7200x3600 – Eric Collins Sep 07 '16 at 02:54
  • they're sparse but I didn't want to complicate things – Eric Collins Sep 07 '16 at 03:08
  • Not a parallel solution but `Reduce('+', matrix.list)` ran around 18 times faster than the above code. – jav Sep 07 '16 at 04:27
  • Of course it runs only on one core. It's run after the parallized loop. All you do there in parallel is list extraction. If you want to leverage parallelization you should do the matrix addition in chunks. – Roland Sep 07 '16 at 06:03

1 Answers1

0

Your problem is that you only do matrix.list[[i]] in parallel and [[ is very fast for lists. The .combine operation is done by the master process after all parallel tasks have been completed.

You should separate your list into chunks like this:

set.seed(42)
n <- 1e3
matrix.list <- replicate(n, matrix(rnorm(1),nrow=1000,ncol=1000), simplify = FALSE)

system.time({
matrix.sum_s <- Reduce("+", matrix.list)
})
#user  system elapsed 
#1.83    1.25    3.08

library(foreach)
library(doParallel)
ncl <- 4
cl <- makeCluster(ncl)
registerDoParallel(cl)

system.time({
matrix.sum_p <- foreach(x = split(matrix.list, (seq_len(n) - 1) %/% (n/ncl)), 
                       .combine='+') %dopar% 
   {Reduce("+", x)}
})
#user  system elapsed 
#6.49   35.97   46.97 
stopCluster(cl)

all.equal(matrix.sum_s, matrix.sum_p)
#[1] TRUE

Of course, the parallelized version is still much slower than simply using Reduce. Why? Because + is a fast low-level (.Primitive) function. foreach spends the time mostly with copying the several GB of dense matrices.

Roland
  • 127,288
  • 10
  • 191
  • 288