Use of doParallel / doMC not only with foreach package

Question

all official tutorials doParallel, doParallel-Vignette, doMC and doMC-Vignette I've found so far cover only how to use parallel-computation in combination with foreach. Is there a way to speed up "sequential"-code as well?

Imagine it like splitting one file into multiple files and executing each file with a different instance of R. E.g.

## <run on core1>
data1 <- getData1()
dataResult1 <- doComplexAlgorithm1(data1)
## </run on core1>

## <run on core2>
data2 <- getData2()
dataResult2 <- doComplexAlgorithm2(data2)
## </run on core2>

## <run on core3>
data3 <- getData3()
dataResult3 <- doComplexAntotherAlgorithm3(data3)
## </run on core3>

## <run on core4>
data4 <- getData4()
dataResult4 <- doComplexNotSoComplexAlgorithm4(data4)
## </run on core4>

Thanks in advance!

(R v.3.2.1, RStudio v.0.99.451)

score 2 · Answer 1 · answered Sep 14 '15 at 09:04

In the base (single-process) scenario, you'd use mapply, passing it a list of your functions:

mapply(function(getData, doAlg) {
    dat <- getData()
    doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))

In the parallel processing case, you can use clusterMap:

library(parallel)
cl <- makeCluster()
clusterMap(cl, function(getData, doAlg) {
    dat <- getData()
    doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))

score 1 · Answer 2 · edited Sep 12 '15 at 00:30

1

It sounds like you want to do what I try to do with images. I've got some images and some computation on them, which by itself takes quite long. The way I do is have a list of files, and:

foreach (i =1:length(fileList)) %dopar% { 
    - load data
    - do something
    - write result to disk
}

It's just as you say, each set of data (file), is processed on its own core provided your system has enough memory to hold it all at once.

edited Sep 12 '15 at 00:30

Kendel

1,698
2
17
33

answered Sep 11 '15 at 23:34

NNN

36
3

Thanks for the reply! The problem is, that the operations done in the different sections differ greatly. It is a mixture of a sql-command and a few data preparation processes. I enhanced my question to make that more clear. – Boern Sep 14 '15 at 07:22

score 0 · Answer 3 · answered Jul 12 '17 at 14:35

So you don't need any memory sharing or communication among each job, or they are independent jobs.

The foreach or lapply paradigm are more designed for splitting a loop or vector process. For totally individual jobs, you need to wrap another layer to make it into a loop.

Wrap each section into a function, put all functions into a list, then call each function in loop.

fun_list <- list(
  fun_1 <- function() {
    data1 <- getData1()
    doComplexAlgorithm1(data1)
},
    fun_2 <- function() {
    data2 <- getData1()
    doComplexAlgorithm2(data2)
},
...
)

Use of doParallel / doMC not only with foreach package

3 Answers3