8

I have created parallel workers (all running on the same machine) using:

MyCluster = makeCluster(8)

How can I make every of these 8 nodes source an R-file I wrote? I tried:

clusterCall(MyCluster, source, "myFile.R")
clusterCall(MyCluster, 'source("myFile.R")')

And several similar versions. But none worked. Can you please help me to find the mistake?

Thank you very much!

Bernd
  • 3,405
  • 3
  • 18
  • 21
  • The first version should work, the second is wrong because a string isn't a function. But why do you think the first one isn't working? Do you get an error message? – Steve Weston Feb 05 '14 at 20:40

2 Answers2

8

The following code serves your purpose:

library(parallel)

cl <- makeCluster(4)
clusterCall(cl, function() { source("test.R") })

## do some parallel work

stopCluster(cl)

Also you can use clusterEvalQ() to do the same thing:

library(parallel)

cl <- makeCluster(4)
clusterEvalQ(cl, source("test.R"))

## do some parallel work

stopCluster(cl)

However, there is subtle difference between the two methods. clusterCall() runs a function on each node while clusterEvalQ() evaluates an expression on each node. If you have a variable list of files to source, clusterCall() will be easier to use since clusterEvalQ(cl,expr) will regard any expr as an expression so it's not convenient to put a variable there.

Kun Ren
  • 4,715
  • 3
  • 35
  • 50
  • Thank you very much Ren, I successfully used your method: `clusterCall(cl, function() { source("test.R") })` Let me only add one thing. I canged it to `output <- clusterCall(cl, function() { source("test.R") })` because otherwise it outputs me a lot of unnessasary information. – Bernd Mar 21 '14 at 13:22
  • Great. `invisible(...)` will also work to prevent explicit output. – Kun Ren Mar 21 '14 at 15:01
  • After some months I am still using this code for parallelziation. Unfortunately I have to make use of old packages from CRAN using checkpoints. I have opened another Question to discuss this connected problem here: http://stackoverflow.com/questions/37028653/r-set-checkpoint-on-worker-of-cluster – Bernd May 04 '16 at 13:28
2

If you use a command to source a local file, ensure the file is there.

Else place the file on a network share or NFS, and source the absolute path.

Better still, and standard answers, write a package and have that package installed on each node and then just call library() or require().

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Sorry, I forget to say that all workers are running on the same computer. They should only make use of different cores. So availibility of the files should not be an issue. – Bernd Feb 05 '14 at 17:05
  • Well I think I have working examples in my 'Intro to HPC with R' slides so you can copy from there. The issue may still be the same: different work directory for your workers. Try an absolute path. – Dirk Eddelbuettel Feb 05 '14 at 17:20
  • I sorry to ask again, but I didn't manage to "source" my files. I tried the absolute path like this:`clusterCall(MyCluster, 'source("D:/folder/file.R")')` and `clusterCall(MyCluster, source, "D:/folder/file.R")` But none worked. The first outputs me a list of function definitions that I have never seen before. The second complains that function "source("D:/folder/file.R")" cannot be found. – Bernd Feb 05 '14 at 18:10
  • 1
    Maybe you want `clusterEvalQ()` instead. I can't recall, but I already pointed you to working examples (albeit older, using `snow` not `parallel`). – Dirk Eddelbuettel Feb 05 '14 at 18:20