1

I'd like to find a way to parallelize repeated independent function calls in which each call modifies the function's parent environment. Each execution of the function is independent, however, for various reasons I am unable to consider any other implementation that doesn't rely on modifying the function's parent environment. See simplified example below. Is there a way to pass a copy of the parent environment to each node? I am running this on a linux system.

 create_fun <- function(){

        helper <- function(x, params) {x+params}
        helper2 <- function(z) {z+helper(z)}

        master <- function(y, a){
            parent <- parent.env(environment())
            formals(parent[['helper']])$params <- a
            helper2(y)}

       return(master)
}

# function to be called repeatedly
master <- create_fun()

# data to be iterated over
x <- expand.grid(1:100, 1:5)

# vector where output should be stored
results <- vector("numeric", nrow(x))

# task I'd like to parallelize
for(i in 1:nrow(x)){
    results[i] <- master(x[i,1], x[i, 2])
}
k13
  • 713
  • 8
  • 17

1 Answers1

1

Functions do maintain references to their parent environments. You can look at the contents of the environment of master (the environment created by create_fun)

ls (environment(master) )
# [1] "helper"  "helper2" "master" 

Using %dopar% you could do

## Globals
master <- create_fun()
x <- expand.grid(1:100, 1:5)

## Previous results
for(i in 1:nrow(x)){
    results[i] <- master(x[i,1], x[i, 2])
}

library(parallel)
library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)

## parallel
res <- foreach(i=1:nrow(x), .combine = c) %dopar% {
    master(x[i,1], x[i,2])
}

all.equal(res, results)
# TRUE
Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • Unfortunately I'm at home where I only have a windows computer and can't run this on my linux system. But essentially I'd be sending the task to 20 different servers. I'll see if this works tomorrow at work. Thanks a lot! Do you have any idea in the backend what's happening with the scoping? – k13 Jul 01 '15 at 01:52
  • I'll see if this works tomorrow, but I suppose the real question is that in general if you define a master function and helpers in the global environment, it seems like you still need to specify/export the helper functions in general to do parallel processing. This would suggest that the parent environment isn't copied along with a function. We shall see. – k13 Jul 01 '15 at 02:28
  • I ended up using the doSnow package but I think what you did would have worked as well even if I passed the tasks to multiple r sessions as opposed to multiple cores on the same computer. – k13 Jul 02 '15 at 01:15