4

Below you can find a piece of code in R which I would like to convert to run as a parallel process using several CPU's. I tried using foreach package, but didn't go far.. I could not find a good example how to make it work given the fact I have 3-level nested loop. Help would be extremely appreciated. Example of code below - I made a simple function so it can serve as an example:

celnum <- c(10,20,30)
t2 <- c(1,2,3)
allrepeat <- 10

samplefunction <- function(celnum,t2){

        x <- rnorm(100,celnum,t2)
        y = sample(x, 1)
        z = sample(x,1)

        result = y+z


        result 
}

Getting results conventional way:

z_grid <- matrix(, nrow = length(celnum), ncol = length(t2))

repetitions <- matrix(, nrow = allrepeat, ncol = 1)



set.seed=20
for(i in 1:length(celnum)){
        for (j in 1:length(t2)){
                for (k in 1:allrepeat) {
                        results <- samplefunction(celnum[i],t2[j]) 
                                repetitions[k] <- results
                                z_grid[i,j] <- mean(repetitions,na.rm=TRUE) 
                }  
        }
}

z_grid

Now trying to do the same using foreach:

set.seed=20

library(foreach)
library(doSNOW)

cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)

set.seed=20
output <- foreach(i=1:length(celnum),.combine='cbind' ) %:% 
        foreach (j=1:length(t2), .combine='c') %:%   
                foreach (k = 1:allrepeat) %do% {
                        mean(samplefunction(celnum[i],t2[j]) )
}  
output

This doesn't work as I would like it to, as it is returning a matrix of 30x2 dimensions instead 3x3. My intention is to simulate the scenario for i and j combinations k times, and would like to get an average of these k simulations for each combination of i and j.

svick
  • 236,525
  • 50
  • 385
  • 514
MIH
  • 1,083
  • 3
  • 14
  • 26
  • 2
    perhaps you have a look at the [documentation](http://stackoverflow.com/documentation/r/1677/parallel-processing/5164/parallel-processing-with-foreach-package#t=201607251334159676214) – loki Jul 25 '16 at 13:34
  • @loki : thanks this is very good. I can run a simple loop with foreach. I do however have a difficult time converting this piece of code which gives me an output of a list of four matrices as a result, and where there is a 3-level nested loop to run the function. I am still on a steep learning curve with R – MIH Jul 25 '16 at 13:44

1 Answers1

6

EDIT:

The nested for loops should look like this. Note that there is only one foreach and two for loops nested.

library(foreach)
library(doSNOW)

cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)

set.seed(20)
output <- foreach(k=1:allrepeat) %dopar% {
  df <- data.frame()
  for (i in 1:length(t2)) {
    for (j in 1:length(celnum)) {
      df[i,j] <- mean(samplefunction(celnum[i],t2[j]))
    }  
  }
  df
}

The Result output as well is a list. To calculate the cell means this post helped a lot.

library(plyr)
aaply(laply(output, as.matrix), c(2,3), mean)

#   X2
# X1       V1       V2       V3
#  1 20.30548 21.38818 18.49324
#  2 40.09506 40.64564 40.34847
#  3 60.10946 59.68913 58.66209

btw: you should ...

stopCluster(cl)

... afterwards.


Original Post:

At first, you have to determine which of the for loops you want to replace with the foreach loop.

Basically this decision is mostly influenced by the results of the loop and thus, how this results can be combined. Since you outsource single processes to the individual processors of your PC, only the last element will be returned. These results will be combined as stated in the .combine parameter (e.g. 'c', 'cbind', etc.). Since you are trying to generate two lists this might be probably not very easy for a first start. Thus, I would like to propose an example which outlines the functionality of the foreach loop nested within other for loops.

library(foreach)
library(doSNOW)

dat1 <- c(15.2, 12.58, 4.25, 1.05, 6.78, 9.22, 11.20)
dat2 <- data.frame(matrix(1:15, ncol = 3))


cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)

for (i in 1:nrow(dat2)) {
  FEresult <- foreach(j = 1:ncol(dat2), .combine = c, .inorder = TRUE) %dopar% {
    tmp <- dat1 * dat2[i, j]
    data.frame(tmp)
  }
  FEresult
  if (i == 1) {
    res <- FEresult
  } else {
    res <- rbind(res, FEresult)
  }
}

res

You will notice, that the result of this loop is a list.

Community
  • 1
  • 1
loki
  • 9,816
  • 7
  • 56
  • 82
  • Thanks. I decided to simplify the function slightly (see edited post). Do you mind having a look and see what final tweak will help it work? – MIH Jul 27 '16 at 13:32
  • when I copy the loop, it works. perhaps you edit the error message into your question, so that I can have a look at it – loki Jul 27 '16 at 14:11
  • There is no error message but the results are not as desired. I would like to get the same results in `output` as in `z_grid`. The former is a [3,3] matrix of [i,j] where k simulations of this matrix are summed up as a mean in each matrix entry. Does it makes sense? In other words, I simulate the scenario for i and j combinations k times, and would like to get a mean of these k simulations for each combination of i and j. The current output with `foreach` gives me 30x3 output dimension and I am not even sure exactly how they are ordered. – MIH Jul 27 '16 at 14:22
  • thanks, works great on this example. however, when i am replacing this function with my more complicated function (fully working, all libraries installed, works with previous loops i had etc) somehow, with %dopar% I get an error message: `Error in { : task 1 failed - "could not find function "SpatialPoints"" ` . When I do %do% there is no problem and the statement is executed. Ever encountered such problem? – MIH Jul 28 '16 at 15:13
  • 1
    you have to add the parameter `foreach (..blabla.., .packages = c("sp", "rgdal")` or whatever packages you want to use during the foreach loop. Thus, the packages are loaded in the individual processes. – loki Jul 28 '16 at 15:43