1

I am doing some heavy computations which I would like to speed up by performing it in a parallel loop. Moreover, I want the result of each calculation to be assigned to the global environment based on the name of the data currently processed:

fun <- function(arg) {
    assign(arg, arg, envir = .GlobalEnv)
}

For loop

In a simple for loop, that would be the following and this works just fine:

for_fun <- function() {
    data <- letters[1:10]
    for(i in 1:length(data)) {
        dat <- quote(data[i])
        call <- call("fun", dat)
        eval(call)
    }
}

# Works as expected
for_fun()

In this function, I first get some data, loop over it, quote it (although not necessary) to be used in a function call. In reality, this function name is also dynamic which is why I am doing it this way.

Foreach

Now, I want to speed this up. My first thought was to use the foreach package (with a doParallel backend):

foreach_fun <- function() {
    # Set up parallel backend
    cl <- parallel::makeCluster(parallel::detectCores())
    doParallel::registerDoParallel(cl)
    
    data <- letters[1:10]
    
    foreach(i = 1:length(data)) %dopar% {
        dat <- quote(data[i])
        call <- call("fun", dat)
        eval(call)
    }
    
    # Stop the parallel backend
    parallel::stopCluster(cl)
    doParallel::stopImplicitCluster()
}

# Error in { : task 1 failed - "could not find function "fun"" 
foreach_fun()

Replacing the whole quote-call-eval procedure with simply fun(data[i]) resolves the error but still nothing gets assigned.

Future

To ensure it wasn't a problem with the foreach package, I also tried the future package (although I am not familiar with it).

future_fun <- function() {
    # Plan a parallel future
    cl <- parallel::makeCluster(parallel::detectCores())
    future::plan(cluster, workers = cl)
    
    data <- letters[1:10]
    
    # Create an explicit future
    future(expr = {
        for(i in 1:length(data)) {
            dat <- quote(data[i])
            call <- call("fun", dat)
            eval(call)
        }
    })

    # Stop the parallel future
    parallel::stopCluster(cl)
    future::plan(sequential)
}

# No errors but nothing assigned
# probably the future was never evaluated
future_fun()

Forcing the future to be evaluated (f <- future(...); value(f)) triggers the same error as by using foreach: Error in { : task 1 failed - "could not find function "fun""

Summary

In short, my questions are:

  1. How do you assign variables to the global environment in a parallel loop?
  2. Why does the function lookup fail?
koenniem
  • 506
  • 2
  • 10
  • 4
    "How do you assign variables to the global environment in a parallel loop?" You don't. The workers have no access to the global environment. You really need to use functional programming for parallelization in R. – Roland Nov 17 '20 at 14:00
  • For `foreach`, you can use `.export` to explicitly export your functions to the workers. Also, I would use the `iterators` package to not export the complete `data` object to all workers and then subset it, but only export the part of the data the specific worker is using – starja Nov 17 '20 at 14:05
  • I guess the function lookup fails because the functions are not directly used but only via `call` (but have not checked this assumption) – starja Nov 17 '20 at 14:06
  • Can you put the output of the parallel process into a list? Then afterwards use `list2env`. – Michael Dewar Nov 17 '20 at 16:26

0 Answers0