0

I am trying to run my R code in parallel. Following is the toy example in which myfunc function returns a number.

library(snowfall); 
sfInit(parallel=TRUE,cpus=5)
a <- 1 : 10000
sfExport("a")
parwrapper <- function(i){
        mysimulation <- myfunc(b=30,c=a[i])
        return(mysimulation)}
sfapply(1:10000,parwrapper)

This is the error that I get. Error in checkForRemoteErrors(val) : 5 nodes produced errors; first error: could not find function "myfunc"

Hello
  • 61
  • 1
  • 9

1 Answers1

0

Welcome to SO.

The error clearly states the problem. parwrapper calls a function myfunc. This function is not defined. In addition you might have to export the object sfExport('myfunc', 'parwrapper').

Oliver
  • 8,169
  • 3
  • 15
  • 37
  • Thanks. Actually, myfunc is a function from one of the R packages. – Hello Jun 12 '20 at 20:47
  • 1
    Fair enough. It is not a part of the `snowfall` library, so it was not clear from the example. In this case you'd have to use `sfLibrary([package name])` in order to export the library with the package contained. Alternatively the `foreach` package, `future` package and `parallel` package might be better at picking up dependencies automatically. – Oliver Jun 13 '20 at 00:24
  • Following is the error:sfLibrary("rpact") Error in sfLibrary("rpact") : Stop: error loading library on slave(s): "rpact" – Hello Jun 14 '20 at 16:42
  • Now this is where i will fall short. From the error it sounds like `rpact` is not installed. You could check whether each slave-node has it installed using `sfClusterEval("rpact" %in% rownames(installed.packages()))` which should return `TRUE` for each node if the package is installed. – Oliver Jun 14 '20 at 20:23
  • Yes it returns TRUE for each node. Is there a way to make it work? – Hello Jun 15 '20 at 01:51
  • I am sorry to say i am short for an answer here. I'd try another parallel package. The foreach package would be something like:`library(foreach);library(doParallel)registerDoParallel(cl <- parallel::makeCluster(5));foreach(x = 1:10000, .combine = c, .multicombine = TRUE, .packages = 'rpact')%dopar% { list(parwrapper(x)) `. The parallel package would look similar to snowfall: `library(parallel); nc <- 5; cl <- makeCluster(nc); clusterEvalQ(cl, library(rpact)); clusterSapply(cl, 1:10000, parwrapper)`. – Oliver Jun 15 '20 at 06:37
  • Great. I will go with foreach package. How many nodes do you recommend? In the above example i used 5. If i use 10 then does it make significant difference? – Hello Jun 15 '20 at 14:30
  • that depends on the task, computer and system your on. In general it comes with experience, but never more than `parallel::detectCores() - 1` (mostly my default). There's a lot to parallel computing, and if your tasks are possible to [vectorize](https://win-vector.com/2019/01/03/what-does-it-mean-to-write-vectorized-code-in-r/) this is often simpler and faster than what one was doing while performing parallel computing. This comes with trial-and-error, and stackoverflow has many well written answers to parallel computing questions. Especially on the `parallel` and `foreach` package. – Oliver Jun 15 '20 at 15:50