I'm trying to distribute jobs across 4 linux worker nodes, each with 8 cores. I'm initiating the jobs from a Windows computer.
Here is how I setup the cluster:
worker_nodes <- c("node-1", "node-2")
ssh_private_key_file <- "C:/user/.ssh/id_rsa"
cl <-
future::makeClusterPSOCK(
worker_nodes,
user = "user",
rshopts = c(
"-o", "StrictHostKeyChecking=no",
"-o", "IdentitiesOnly=yes",
"-i", ssh_private_key_file
),
rscript = "/usr/bin/Rscript",
homogeneous = FALSE,
tries = 5
)
This appears to work just fine. If I print the cl
object I get Socket cluster with 2 nodes..
.
Now I want to define how the jobs will be distributed using future::plan()
. I want each node to get the same number of jobs, and I want each node to use all 8 cores to process the jobs in parallel.
Here is my plan:
future::plan(list(future::tweak(future::cluster, workers = cl),
future::tweak(future::multisession, workers = 8)))
Now if I create some jobs using:
furrr::future_map(input_values, slow_function)
I only see one core being used on my cluster nodes.
If I recreate the cluster by duplicating the worker nodes:
worker_nodes <- rep(c("node-1", "node-2"), 8)
then I see more cores being used on the cluster nodes, and the multisession part seems to be ignored.
But the first approach paired with "multisession" should also result in multiple R processes running and multiple cores being used.
Also, I believe 4 of the 8 cores are slower than the other 4. So the computation time using the "rep worker nodes" method seems to be bound by the slowness of the slower 4 cores--I'm wondering if the "multisession" bit that doesn't seem to be working might resolve this issue.
How do I enable each node to use the "multisession" plan?