10

I'm playing around with parallellization in R for the first time. As a first toy example, I tried

library(doMC)
registerDoMC()

B<-10000

myFunc<-function()
{
    for(i in 1:B) sqrt(i)
}

myFunc2<-function()
{
    foreach(i = 1:B)  %do% sqrt(i)
}

myParFunc<-function()
{
    foreach(i = 1:B) %dopar% sqrt(i)
}

I know that sqrt() executes too fast for parallellization to matter, but what I didn't expect was that foreach() %do% would be slower than for():

> system.time(myFunc())
   user  system elapsed 
  0.004   0.000   0.005 
> system.time(myFunc2())
   user  system elapsed 
  6.756   0.000   6.759 
> system.time(myParFunc())
   user  system elapsed 
  6.140   0.524   6.096 

In most examples that I've seen, foreach() %dopar% is compared to foreach() %do% rather than for(). Since foreach() %do% was much slower than for() in my toy example, I'm now a bit confused. Somehow, I thought that these were equivalent ways of constructing for-loops. What is the difference? Are they ever equivalent? Is foreach() %do% always slower?

UPDATE: Following @Peter Fines answer, I update myFunc as follows:

 a<-rep(NA,B)
 myFunc<-function()
 {
     for(i in 1:B) a[i]<-sqrt(i)
 }

This makes for() a bit slower, but not much:

> system.time(myFunc())
   user  system elapsed 
  0.036   0.000   0.035 
> system.time(myFunc2())
   user  system elapsed 
  6.380   0.000   6.385 
MånsT
  • 904
  • 5
  • 19
  • 1
    See also this question: http://stackoverflow.com/questions/5007458/problems-using-foreach-parallelization and this one: http://stackoverflow.com/questions/5012804/mpi-parallelization-using-snow-is-slow – Charlie May 02 '12 at 16:14
  • Thanks @Charlie, the answers to those questions were very helpful to what I'm trying to do after I'm done with my toy example! :) I'm still not sure that I understand why `foreach` needs so much more time when using the `%do%` option though. – MånsT May 02 '12 at 16:33
  • A big part of it is that %do% has to parcel out the pieces/assignments, send them to the processors, then rejoin everything at the end as appropriate. These steps require organizational time that the unparallelized version doesn't. – Charlie May 02 '12 at 17:04
  • Isn't that what `%dopar%` does? – MånsT May 02 '12 at 17:06
  • Interesting! **I got exactly the opposite result!** See [Why is R for loop 10 times slower than when using foreach?](http://stackoverflow.com/questions/24651664/why-is-r-for-loop-10-times-slower-than-when-using-foreach) – Tomas Jul 09 '14 at 10:53

1 Answers1

8

for will run sqrt B times, presumably discarding the answer each time. foreach, however, returns a list containing the result of each execution of the loop body. This would contribute considerable extra overhead, regardless of whether it's running in parallel or sequential mode (%dopar% or %do%).

I based my answer by running the following code, which appears to be confirmed by the foreach vignette, which states "foreach differs from a for loop in that its return is a list of values, whereas a for loop has no value and uses side effects to convey its result."

> print(for(i in 1:10) sqrt(i))
NULL

> print(foreach(i = 1:10) %do% sqrt(i))
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
... etc

UPDATE: I see from your updated question that the above answer isn't nearly sufficient to account for the performance difference. So I looked at the source code for foreach and can see that there is a LOT going on! I haven't tried to understand exactly how it works, however do.R and foreach.R show that even when %do% is run, large parts of the foreach configuration is still run, which would make sense if perhaps the %do% option is largely provided to allow you to test foreach code without having to have a parallel backend configured and loaded. It also needs to support the more advanced nesting and iteration facilities that foreach provides.

There are references in the code to results caching, error checking, debugging and the creation of local environment variables for the arguments of each iteration (see the function doSEQ in do.R for example). I'd imagine this is what creates the difference that you've observed. Of course, if you were running much more complicated code inside your loop (that would actually benefit from a parallelisation framework like foreach), this overhead would become irrelevant compared with the benefits it provides.

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
Peter Fine
  • 2,873
  • 3
  • 14
  • 16
  • Right - that should explain at least part of the difference! I'm still not sure if it explains all of it though; see the update to my question! – MånsT May 02 '12 at 16:24