1

I am calculating permutation test statistic using a for loop. I wish to speed this up using parallel processing (in particular, foreach in foreach package). I am following the instructions from: https://beckmw.wordpress.com/2014/01/21/a-brief-foray-into-parallel-processing-with-r/

My original code:

library(foreach)
library(doParallel)
set.seed(10)
x = rnorm(1000)
y = rnorm(1000)
n = length(x)
nexp = 10000
perm.stat1 = numeric(n)
ptm = proc.time()
for (i in 1:nexp){
  y = sample(y)
  perm.stat1[i] = cor(x,y,method = "pearson")
  }
proc.time()-ptm
# 1.321 seconds

However, when I used the foreach loop, I got the result much slower:

cl<-makeCluster(8)
registerDoParallel(cl)
perm.stat2 = numeric(n)
ptm = proc.time()
perm.stat2 = foreach(icount(nexp), .combine=c) %dopar% {
  y = sample(y)
  cor(x,y,method = "pearson")
}
proc.time()-ptm
stopCluster(cl)
#3.884 seconds

Why is this happening? What did I do wrong? Thanks

user227710
  • 3,164
  • 18
  • 35
Kevin
  • 333
  • 1
  • 3
  • 9

2 Answers2

2

You're getting bad performance because you're splitting up a small problem into 10,000 tasks, each of which takes about an eighth of a millisecond to execute. It's alright to simply turn a for loop into a foreach loop when the body of the loop takes a significant period of time (I used to say at least 10 seconds, but I've dropped that to at least a second nowadays), but that simple strategy doesn't work when the tasks are very small (in this case, extremely small). When the tasks are small you spend most of your time sending the tasks and receiving the results from workers. In other words, the communication overhead is greater than the computation time. Frankly, I'm amazed that you didn't get much worse performance.

To me, it doesn't really seem worthwhile to parallelize a problem that takes less than two seconds to execute, but you can actually get a speed up using foreach by chunking. That is, you split the problem into smaller chunks, usually giving one chunk to each worker. Here's an example:

nw <- getDoParWorkers()
perm.stat1 <-
  foreach(xnexp=idiv(nexp, chunks=nw), .combine=c) %dopar% {
    p = numeric(xnexp)
    for (i in 1:xnexp) {
      y = sample(y)
      p[i] = cor(x,y,method="pearson")
    }
    p
  }

As you can see, the foreach loop is splitting the problem into chunks, and the body of that loop contains a modified version of the original sequential code, now operating on a fraction of the entire problem.

On my four core Mac laptop, this executes in 0.447 seconds, compared to 1.245 seconds for the sequential version. That seems like a very respectable speed up to me.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
0

There's a lot more computational overhead in the foreach loop. This returns a list containing each execution of the loop's body that is then combined into a vector via the .combine=c argument. The for loop does not return anything, instead assigning a value to perm.stat1 as a side effect, so does not need any extra overhead.

Have a look at Why is foreach() %do% sometimes slower than for? for a more in-depth explaination of why foreach is slower than for in many cases. Where foreach comes into its own is when the operations inside the loop are computationally intensive, making the time penalty associated with returning each value in a list insignificant by comparison. For example, the combination of rnorm and summary used in the Wordpress article above.

Community
  • 1
  • 1
N McWilliams
  • 62
  • 2
  • 6