0

I'm trying to distribute jobs across 4 linux worker nodes, each with 8 cores. I'm initiating the jobs from a Windows computer.

Here is how I setup the cluster:

worker_nodes <- c("node-1", "node-2")

ssh_private_key_file <- "C:/user/.ssh/id_rsa"
cl <-
  future::makeClusterPSOCK(
    worker_nodes,
    user = "user",
    rshopts = c(
      "-o", "StrictHostKeyChecking=no",
      "-o", "IdentitiesOnly=yes",
      "-i", ssh_private_key_file
    ),
    rscript = "/usr/bin/Rscript",
    homogeneous = FALSE,
    tries = 5
  )

This appears to work just fine. If I print the cl object I get Socket cluster with 2 nodes...

Now I want to define how the jobs will be distributed using future::plan(). I want each node to get the same number of jobs, and I want each node to use all 8 cores to process the jobs in parallel.

Here is my plan:

future::plan(list(future::tweak(future::cluster, workers = cl), 
          future::tweak(future::multisession, workers = 8)))

Now if I create some jobs using:

furrr::future_map(input_values, slow_function)

I only see one core being used on my cluster nodes.

If I recreate the cluster by duplicating the worker nodes:

worker_nodes <- rep(c("node-1", "node-2"), 8)

then I see more cores being used on the cluster nodes, and the multisession part seems to be ignored.

But the first approach paired with "multisession" should also result in multiple R processes running and multiple cores being used.

Also, I believe 4 of the 8 cores are slower than the other 4. So the computation time using the "rep worker nodes" method seems to be bound by the slowness of the slower 4 cores--I'm wondering if the "multisession" bit that doesn't seem to be working might resolve this issue.

How do I enable each node to use the "multisession" plan?

Giovanni Colitti
  • 1,982
  • 11
  • 24
  • 1
    The first `future_map()` call distributes the `input_values` over the cluster defined by `cl`, but then all you are doing is calling `slow_function` on each cluster node. To also run in parallel on each node you need to call `future_map()` again in the inner function. Have a look at https://furrr.futureverse.org/articles/remote-connections.html#running-code-in-parallel-on-each-ec2-instance for an example. – Davis Vaughan Mar 26 '23 at 12:22

0 Answers0