4

I've read that the correct way to do nested foreach loops in R is via the nesting operator %:% (e.g. https://cran.r-project.org/web/packages/foreach/vignettes/nested.html).

However, code can't be added between the inner and outer loops when using this approach -- see example below.

Is there a way to create nested, parallelised foreach loops such that code can be added between the inner and outer loops?

More generally, is there anything wrong with the obvious way that springs to mind, namely simply nesting two foreach loops with the %dopar% operator instead of the %:% operator? See trivial example below.

library(foreach)

# Set up backend
cl = makeCluster(6)
registerDoParallel(cl)
on.exit(stopCluster(cl))

# Run nested loop with '%:%' operator. Breaks if adding code between the inner and outer loops 
foreach(i=1:2) %:% 
  # a = 1 #trivial example of running code between outer and inner loop -- throws error 
  foreach(j = 1:3) %dopar% {
    i * j
  }

# Run nested loop using 2 '%dopar%' statements -- is there anything wrong with this?
foreach(i=1:2, .packages = 'foreach') %dopar% {
  a = 1 #trivial example of running code between outer and inner loop
  foreach(j = 1:3) %dopar% {
    i * j
  }
}
jruf003
  • 980
  • 5
  • 19
  • 2
    I would avoid trying to run nested parallel loops, one should either parallelize the inner or outer loops and not both. If you have 6 cores and all 6 cores are used for the outer loop, the how are there any cores leftover to parallelize the inner loop? – Dave2e May 17 '21 at 02:47
  • I guess you can do that with loops of futures from R package {future}. – F. Privé May 17 '21 at 06:00

1 Answers1

6

The chapter "Using %:% with %dopar%" from documentation you provided gives a useful hint:

all of the tasks are completely independent of each other, and so they can all be executed in parallel

The %:% operator turns multiple foreach loops into a single loop. That is why there is only one %do% operator in the example above. And when we parallelize that nested foreach loop by changing the %do% into a %dopar%, we are creating a single stream of tasks that can all be executed in parallel.

When you combine two %dopar% and measure execution time, you see that only the outer loop is executed in parallel, this is probably not what you're looking for :

system.time(
foreach(i=1:2, .packages = 'foreach') %dopar% {
  # Outer calculation
  Sys.sleep(.5)
  foreach(j = 1:3) %dopar% {
    # Inner calculation
    Sys.sleep(1)
  }
})
#  user      system     elapsed 
#  0.00        0.00        3.52 

This elapsed time reflects:

parallel[ outer(0.5s) + sequential [3 * inner(1s)] ] ~ 3.5s

If the outer calculation is not too long, putting it into the inner loop is actually faster because the 6 workers of your example are used:

system.time(res <- foreach(i=1:2, .packages = 'foreach') %:%
  foreach(j = 1:3) %dopar% {
    # Outer calculation
    Sys.sleep(.5)
    # Inner calculation
    Sys.sleep(1)
  })
#  user      system     elapsed 
#  0.02        0.02        1.52 

If the outer calculation is too long and you have much more inner loops that outer loops, you could precalculate the outer loop in parallel. You can then use the result within %:%:

system.time({
  precalc <- foreach(i=1:2) %dopar% {
    # Outer pre-calculation
    Sys.sleep(2)
    i
  }
  foreach(i=1:2, .packages = 'foreach') %:%
    foreach(j = 1:12) %dopar% {
      # Inner calculation
      Sys.sleep(1)
      precalc[[i]]*j
    }
})
#   user  system elapsed 
#   0.11    0.00    5.25 

Is faster than :

system.time({
  foreach(i=1:2, .packages = 'foreach') %:%
    foreach(j = 1:12) %dopar% {
      # Outer calculation
      Sys.sleep(2)
      
      # Inner calculation
      Sys.sleep(1)
      i*j
    }
})

#   user  system elapsed 
#   0.13    0.00    9.21
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thanks @Waldi, this is very helpful. Just want to clarify two points. 1) When you say "When you combine two %dopar% and measure execution time, you see that only the outer loop is executed in parallel", do you know for a fact that this is true or is it just based on slow execution time? 2) My primary question was whether there's a way to create nested, parallelised foreach loops such that code can be added between the inner and outer loops. Am I correct in assuming the short answer is "no -- you need to precalculate it"? Much appreciated – jruf003 May 27 '21 at 03:28
  • @jruf03, Regarding question 1), this is a fact. I specifically used `Sys.sleep(0.5)` for outer loop and `Sys.sleep(1)` for inner loop so that we can check what is happening : 2 cores are running in parallel [outer(0.5s) + not parallel [3 * inner(1s)]] ~ 3.5s. This calculation holds even if we increase the number of outer loops (up to the 6 cores available). – Waldi May 27 '21 at 05:13
  • Regarding question 2), the short answer is that I didn't manage [to use futures](https://github.com/HenrikBengtsson/future/issues/95) to launch sub-processes inside main processes as suggested in previous comments. I think this could work but wasn't able to prove it, so I went for the pre-calculation workaround which as a side effect make clear that key to parallelization is to keep **tasks independent from each other**. – Waldi May 27 '21 at 05:13