R: makePSOCKcluster hyperthreads 50% of CPU cores

Question

I try to run an R script on a single Linux machine with two CPUs containing 8 physical cores each. The R code automatically identifies the number of cores via detectCores(), reduces this number by one and implements it into the makePSOCKcluster command. According to performance parameters, R only utilizes one of CPUs and hyperthreads the included cores. No workload is distributed to the second CPU.

In case I specify detectCores(logical = FALSE), the observed burden on the first CPU becomes smaller but the second one is still inactive.

How do I fix this? Since the entire infrastructure is located in a single machine, Rmpi should not be necessary in this case.

FYI: the R script consists of foreach loops that rely on the doSNOW package.

Would you mind to also post an output from the actual hardware NUMA-discovery process, as reported from **`lstopo`**? — user3666197, Dec 29 '17 at 10:37

score 1 · Answer 1 · answered Dec 29 '17 at 10:51

try using makeCluster() and define the cluster type and length with a task\worker list.
it works for me and runs each task on a different core\process.
consider (if possible) redefining each task separately and not just using foreach.

here is an example of what i'm using,
the result of out would be a list of all results from each core by order from the list.

tasks = list(task1,taks2, ...)
cl = makeCluster(length(Tasks), type = "PSOCK")
clusterEvalQ(cl,c(library(dplyr),library(httr)))
clusterExport(cl, list("varname1", "varname2"),envir=environment())
out <- clusterApply(
      cl,
      Tasks,
      function(f) f()
    )

I use nested `foreach` loops with more than 8000 interations in total of which each generates a raster file. Thus, there are a lot more tasks than workers. Each raster file relies on different shapefiles and bigger raster files. That is easy to model in a loop but appears not feasible under the above stated design. — Chr, Dec 29 '17 at 12:37

score 1 · Answer 2 · answered Jan 05 '18 at 13:19

The solution is not to rely on snow in my case. Instead I launch the R script with mpirun and let this command manage the parallel environment from outside R. doSNOW needs to be replaced with doMPI accordingly.

With this setup both CPUs are adequately utilized.

R: makePSOCKcluster hyperthreads 50% of CPU cores

2 Answers2