0

all official tutorials doParallel, doParallel-Vignette, doMC and doMC-Vignette I've found so far cover only how to use parallel-computation in combination with foreach. Is there a way to speed up "sequential"-code as well?

Imagine it like splitting one file into multiple files and executing each file with a different instance of R. E.g.

## <run on core1>
data1 <- getData1()
dataResult1 <- doComplexAlgorithm1(data1)
## </run on core1>

## <run on core2>
data2 <- getData2()
dataResult2 <- doComplexAlgorithm2(data2)
## </run on core2>

## <run on core3>
data3 <- getData3()
dataResult3 <- doComplexAntotherAlgorithm3(data3)
## </run on core3>

## <run on core4>
data4 <- getData4()
dataResult4 <- doComplexNotSoComplexAlgorithm4(data4)
## </run on core4>

Thanks in advance!

(R v.3.2.1, RStudio v.0.99.451)

lmo
  • 37,904
  • 9
  • 56
  • 69
Boern
  • 7,233
  • 5
  • 55
  • 86

3 Answers3

2

In the base (single-process) scenario, you'd use mapply, passing it a list of your functions:

mapply(function(getData, doAlg) {
    dat <- getData()
    doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))

In the parallel processing case, you can use clusterMap:

library(parallel)
cl <- makeCluster()
clusterMap(cl, function(getData, doAlg) {
    dat <- getData()
    doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

It sounds like you want to do what I try to do with images. I've got some images and some computation on them, which by itself takes quite long. The way I do is have a list of files, and:

foreach (i =1:length(fileList)) %dopar% { 
    - load data
    - do something
    - write result to disk
} 

It's just as you say, each set of data (file), is processed on its own core provided your system has enough memory to hold it all at once.

Kendel
  • 1,698
  • 2
  • 17
  • 33
NNN
  • 36
  • 3
  • Thanks for the reply! The problem is, that the operations done in the different sections differ greatly. It is a mixture of a sql-command and a few data preparation processes. I enhanced my question to make that more clear. – Boern Sep 14 '15 at 07:22
0

So you don't need any memory sharing or communication among each job, or they are independent jobs.

The foreach or lapply paradigm are more designed for splitting a loop or vector process. For totally individual jobs, you need to wrap another layer to make it into a loop.

Wrap each section into a function, put all functions into a list, then call each function in loop.

fun_list <- list(
  fun_1 <- function() {
    data1 <- getData1()
    doComplexAlgorithm1(data1)
},
    fun_2 <- function() {
    data2 <- getData1()
    doComplexAlgorithm2(data2)
},
...
)
dracodoc
  • 2,603
  • 1
  • 23
  • 33