2

Is nesting parallel::mclapply calls a good idea?

require(parallel)
ans <- mclapply(1:3, function(x) mclapply(1:3, function(y) y * x))
unlist(ans)

Outputs:

[1] 1 2 3 2 4 6 3 6 9

So it's "working". But is it recommended for real compute-intensive tasks that outnumber the number of cores? what is going on when this is executed? Are the multiple forks involved more potentially wasteful? What are the considerations for mc.cores and mc.preschedule?

Edit Just to clarify the motivation, often it seems natural to parallelize by splitting one dimension (e.g., use different cores to handle data from n different years), then within this split comes another natural way to split (e.g., use different cores to calculate each one of m different functions). When m times n is smaller than the total number of available cores the above nesting looks sensible, at least on the face of it.

dzeltzer
  • 990
  • 8
  • 28

1 Answers1

2

In the following experiment, the parallel execution of the test function testfn() was faster compared to the nested parallel execution:

library(parallel)
library(microbenchmark)
testfn <- function(x) rnorm(10000000)

microbenchmark('parallel'= o <- mclapply(1:8, testfn, mc.cores=4),
               'nested'  = o <- mclapply(1:2, function(x) mclapply(1:4, testfn, mc.cores=2), 
                                         mc.cores=2),
               times=10)
Unit: seconds
     expr      min       lq     mean   median       uq      max neval
 parallel 3.727131 3.756445 3.802470 3.815977 3.834144 3.890128    10
   nested 4.355846 4.372996 4.508291 4.453881 4.578837 4.863664    10

Explanation:
The communication between the R session and four R workers seems to be more efficient than the communication between the R session and two workers which in turn fork and communicate to two other workers each.

Alternative:
The package foreach can handle nested loops, which is close to nested mclapply() calls; see the vignette https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf.

(The optimal setting of the argument mc.preschedule depends on the specific problem; see the help page ?mclapply.)

Nairolf
  • 2,418
  • 20
  • 34