4

I am developing an R package and trying to use parallel processing in it for an embarrassingly parallel problem. I would like to write a loop or functional that uses the other functions from my package. I am working in Windows, and I have tried using parallel::parLapply and foreach::%dopar%, but cannot get the workers (cores) to access the functions in my package. Here's an example of a simple package with two functions, where the second calls the first inside a parallel loop using %dopar%:

add10 <- function(x) x + 10

slowadd <- function(m) {
  cl <- parallel::makeCluster(parallel::detectCores() - 1)
  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached

  foreach::foreach(i = 1:m) %dopar% {
    Sys.sleep(1)
    add10(i)
  }

  stopCluster(cl)
}

When I load the package with devtools::load_all() and call the slowadd function, Error in { : task 1 failed - "could not find function "add10"" is returned.

I have also tried explicitly initializing the workers with my package:

add10 <- function(x) x + 10

slowadd <- function(m) {
  cl <- parallel::makeCluster(parallel::detectCores() - 1)
  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached

  foreach::foreach(i = 1:m, .packages = 'mypackage') %dopar% {
    Sys.sleep(1)
    add10(i)
  }

  stopCluster(cl)
}

but I get the error Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : worker initialization failed: there is no package called 'mypackage'.

How can I get the workers to access the functions in my package? A solution using foreach would be great, but I'm completely open to solutions using parLapply or other functions/packages.

nealmaker
  • 157
  • 8
  • Im only familiar with parallel and not the dopar stuff, but for parallel you will want to add library calls/ functions / data to each node with for example parallel::clusterEvalQ, clusterExport etc. There are examples in the help pages ?clusterExport – user20650 Aug 21 '20 at 17:02
  • @user20650 I have tried using parallel and clusterExport, but have the same problem of my package not being found. I'm not sure if it's looking in the wrong environment (I've tried defining the environment explicitly) or if maybe it has something to do with the way package development works. I can successfully use other packages using, for example, parallel::clusterEvalQ(cl, library(dplyr)). – nealmaker Aug 21 '20 at 17:27
  • This is how id set it up: https://chat.stackoverflow.com/rooms/220232/neal – user20650 Aug 21 '20 at 17:45
  • 1
    Have you properly installed your package? – F. Privé Aug 21 '20 at 18:20
  • @F.Privé I guess I haven't properly installed it. I have been making changes, loading the package with devtools::load_all(), and then testing it. Should I be installing the package with devtools::install() every time instead? – nealmaker Aug 21 '20 at 20:16
  • 1
    Are you using RStudio? Install and Restart should be sufficient. – F. Privé Aug 21 '20 at 20:21

2 Answers2

1

I was able to initialize the workers with my package's functions, thanks to people's helpful comments. By making sure that all of the package functions that were needed were exported in the NAMESPACE and installing my package with devtools::install(), foreach was able to find the package for initialization. The R script for the example would look like this:

#' @export
add10 <- function(x) x + 10

#' @export
slowadd <- function(m) {
  cl <- parallel::makeCluster(parallel::detectCores() - 1)
  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached

  out <- foreach::foreach(i = 1:m, .packages = 'mypackage') %dopar% {
    Sys.sleep(1)
    add10(i)
  }

  stopCluster(cl)
  return(out)
} 

This is working, but it's not an ideal solution. First, it makes for a much slower workflow. I was using devtools::load_all() every time I made a change to the package and wanted to test it (before incorporating parallelism), but now I have to reinstall the package every time, which is slow when the package is large. Second, every function that is needed in the parallel loop needs to be exported so that foreach can find it. My actual use case has a lot of small utility functions which I would rather keep internal.

nealmaker
  • 157
  • 8
0

You can use devtools::load_all() inside the foreach loop or load the functions you need with source.

out <- foreach::foreach(i = 1:m ) %dopar% {
    Sys.sleep(1)
    source("R/some_functions.R")
    load("R/sysdata.rda")
    add10(i)
  }