2

I try to run an R script on a single Linux machine with two CPUs containing 8 physical cores each. The R code automatically identifies the number of cores via detectCores(), reduces this number by one and implements it into the makePSOCKcluster command. According to performance parameters, R only utilizes one of CPUs and hyperthreads the included cores. No workload is distributed to the second CPU.

In case I specify detectCores(logical = FALSE), the observed burden on the first CPU becomes smaller but the second one is still inactive.

How do I fix this? Since the entire infrastructure is located in a single machine, Rmpi should not be necessary in this case.

FYI: the R script consists of foreach loops that rely on the doSNOW package.

Chr
  • 1,017
  • 1
  • 8
  • 29
  • Would you mind to also post an output from the actual hardware NUMA-discovery process, as reported from **`lstopo`**? – user3666197 Dec 29 '17 at 10:37

2 Answers2

1

try using makeCluster() and define the cluster type and length with a task\worker list.
it works for me and runs each task on a different core\process.
consider (if possible) redefining each task separately and not just using foreach.

here is an example of what i'm using,
the result of out would be a list of all results from each core by order from the list.

tasks = list(task1,taks2, ...)
cl = makeCluster(length(Tasks), type = "PSOCK")
clusterEvalQ(cl,c(library(dplyr),library(httr)))
clusterExport(cl, list("varname1", "varname2"),envir=environment())
out <- clusterApply(
      cl,
      Tasks,
      function(f) f()
    )
Dror Bogin
  • 453
  • 4
  • 13
  • I use nested `foreach` loops with more than 8000 interations in total of which each generates a raster file. Thus, there are a lot more tasks than workers. Each raster file relies on different shapefiles and bigger raster files. That is easy to model in a loop but appears not feasible under the above stated design. – Chr Dec 29 '17 at 12:37
1

The solution is not to rely on snow in my case. Instead I launch the R script with mpirun and let this command manage the parallel environment from outside R. doSNOW needs to be replaced with doMPI accordingly.

With this setup both CPUs are adequately utilized.

Chr
  • 1,017
  • 1
  • 8
  • 29