1

I am trying to load some sources to be available in cluster nodes (I'm trying out to do something with ClusterEvalQ from the "parallel" package). The problem I have is that for some reason some of the functions that are normally loaded when I simply use source() from within a script are not loaded when loaded with evalq(). I am trying to source my files that have multiple function definitions in the compute nodes with clusterEvalQ() - apparently there is this "tail" argument in the end of the source function that prevents from loading the last function. How do I go about fixing that?

I have seen there is another question addressing the same issue.

But my problem is different. It loads everything except the last thing defined in the source file.

Thank you for improving the formating guys - I rarelly ask questions on stack overflow. My current workaround is to put a dummy empty function on the most imporant sources.

Community
  • 1
  • 1
user1941126
  • 329
  • 1
  • 8

1 Answers1

1

The clusterEvalQ function is originally part of the snow package that parallel uses to do some of its parallelization. In that package there are two functions typically used "pass stuff to nodes". These are:

1) clusterEvalQ

This function is used to evaluate expressions on each node. Typically used to load packages through library or require. The snow documentation says:

clusterEvalQ evaluates a literal expression on each cluster node. It a cluster version of evalq, and is a convenience function defined in terms of clusterCall.

I'm not sure how this would work for evaluating a source call on each node, because honestly I've never tried. When I source functions I usually go with...

2) clusterExport

This function passes objects from the current workspace on to the nodes. This can be used in conjunction with source because these functions are part of the workspace just as any other object (you can do that before setting up the cluster, and then pass the sourced functions to the nodes):

clusterExport assigns the values on the master of the variables named in list to variables of the same names in the global environments of each node.

The list argument is actually a character vector of objects names you want to pass on to the nodes. I usually take the lazy route (because I keep my workspace clean in the first place) and do:

clusterExport(localCl, list=ls())

Hope this helps!

SimonG
  • 4,701
  • 3
  • 20
  • 31
  • I'm waiting for some benchmarks to finish to try it and mark it as correct - it sounds quite promissing! By browsing through the sources I've found out that in the end of the 'source()' function there it is explicitly declared that 'if(!tail){...}' the last thing declared is "invisible". Could someone enlighten us about why? I believe this would be really constructive knowledge to spread. Thank you for your answer. – user1941126 Aug 25 '14 at 14:13
  • The performance should not be such a problem, unless the files to `source` are very large. Using the `list` argument of `clusterExport` properly is a good way of reducing the number of objects that are passed to the nodes. Finally if the computations are very time-consuming, then calling `clusterExport` will make for small fraction of the total computing time because it is only called once. – SimonG Aug 28 '14 at 16:09