R package Future - why does a loop with remote workers hangs the local R session

Question

Please let me know if you need an example, but I don't think it is necessary.

I've written a for loop that makes futures and store the results of each in a list. The plan is remote, say, made of 4 nodes on an internet machine.

After the 4th future is deployed and all cores of the remote machine are busy, R hangs until one of them is free. As I'm not using any of my local cores, why does it have to hang? Is that a way to change this behavior?

score 3 · Answer 1 · answered Apr 10 '20 at 03:44

Author of the future framework here. This behavior is by design.

Your main R session has a certain number of workers available. The number of workers depends on what future plan you have set up. You can always check then number of workers set up by calling nbrOfWorkers(). In your case, you have four remote workers, which means that nbrOfWorkers() returns 4.

You can this number of futures (= nbrOfWorkers()) active at any time without blocking. When you attempt to create one more future, there are no more workers available to take it on. At this point, the only option is to block.

Now, it could be that you are asking: How can I make use of my local machine when the remote workers are all busy?

The easiest way to achieve this is by adding one of more local workers in the mix of remote workers. For example, if you allow yourself to use two workers on your local machine, you can do this as:

library(future)
remote_workers <- makeClusterPSOCK(c("n1.remote.org", "n2.remote.org"))
local_workers <- makeClusterPSOCK(2)
plan(cluster, workers = c(remote_workers, local_workers))

or even just

library(future)
remote_workers <- c("n1.remote.org", "n2.remote.org")
local_workers <- rep("localhost", times = 2)
plan(cluster, workers = c(remote_workers, local_workers))

R package Future - why does a loop with remote workers hangs the local R session

1 Answers1