2

I’m running a nested 3 layers foreach loop but unable to prevent the codes from 100% occupying the remote server (Linux, centOS, physical cores = 14, logical cores = 56). The framework I use is:

Library(doParallel)
doParallel::registerDoParallel(20)
outRes <- foreach1(I = seq1, …) %:% 
              foreach2(j = seq2, …) %dopar% {
                  innerRes <- foreach3(k = seq3, …)
              }

There are three questions occuring to me.

  1. For nested foreach loops, will the registered backend be passed to each foreach loop and actually result in 20*3 = 60 workers?
  2. What’s the mathematical relationship between number of workers and the CPU utility percentage?
  3. In my real case, foreach1 and foreach2 are small processes, while foreach3 is large process. This causes a problem that most time the workers are idle waiting, leading to waste of workers. Is there any solution to fix it?

PS: a reproducible codes example is attached.

library(mlbench)
data("Sonar")
str(Sonar)
table(Sonar$Class)

seed <- 1234
# for cross validation
number_outCV <- 10
repeats_outCV <- 10
number_innerCV <- 10
repeats_innerCV <- 10

# list of numbers of features to model
featureSeq <- c(10, 30, 50)
# for LASSO training
lambda <- exp(seq(-7, 0, 1))
alpha <- 1

dataList <- list(data1 = Sonar, data2 = Sonar, data3 = Sonar, data4 = Sonar, data5 = Sonar, data6 = Sonar)

# library(doMC)
# doMC::registerDoMC(cores = 20)
library(doParallel)
doParallel::registerDoParallel(20)

nestedCV <- foreach::foreach(clust = 1:length(dataList), .combine = "c", .verbose = TRUE) %:%
  foreach::foreach(outCV = 1:(number_outCV*repeats_outCV), .combine = "c", .verbose = TRUE) %dopar% {
    # prepare data
    dataset <- dataList[[clust]]
    table(dataset$Class)

    # split data into model developing and testing data in the outCV: repeated 10-fold CV
    set.seed(seed)
    ResampIndex <- caret::createMultiFolds(y = dataset$Class, k = number_outCV, times = repeats_outCV)
    developIndex <- ResampIndex[[outCV]]
    developX <- dataset[developIndex, !colnames(dataset) %in% c("Class")]
    developY <- dataset$Class[developIndex]

    testX <- dataset[-developIndex, !colnames(dataset) %in% c("Class")]
    testY <- dataset$Class[-developIndex]

    # get a pool of all the features
    features_all <- colnames(developX)

    # training model with inner repeated 10-fold CV
    # foreach for nfeature search
    nfeatureRes <- foreach::foreach(featNumIndex = seq(along = featureSeq), .combine = "c", .verbose = TRUE) %dopar% {
      nfeature <- featureSeq[featNumIndex]
      selectedFeatures <- features_all[1:nfeature]

      # train LASSO
      lassoCtrl <- trainControl(method = "repeatedCV", 
                                number = number_innerCV, 
                                repeats = repeats_innerCV, 
                                verboseIter = TRUE, returnResamp = "all", savePredictions = "all", 
                                classProbs = TRUE, summaryFunction = twoClassSummary)
      lassofit.cv <- train(x = developX[, selectedFeatures], 
                           y = developY, 
                           method = "glmnet",
                           metric = "ROC",
                           trControl = lassoCtrl, 
                           tuneGrid = expand.grid(lambda = lambda, alpha = alpha),
                           preProcess = c("center", "scale"))

      AUC.test <- pROC::auc(response = testY, predictor = predict(lassofit.cv, newdata = testX[, selectedFeatures], type = "prob")[[2]])
      performance <- data.frame(Class = clust, outCV = outCV, nfeature = nfeature, AUC.cv = max(lassofit.cv$results$ROC), AUC.test = as.numeric(AUC.test))
    }
    # end of nfeature search foreach loop
    nfeatureRes
  }
# end of outCV foreach loop as well as the dataList foreach loop
foreach::registerDoSEQ()
CcMango
  • 377
  • 1
  • 4
  • 15

3 Answers3

2

If you want to make sure your code only uses a certain number of cores, you can pin your process to specific cores. This is called "CPU affinity" and in R you can use parallel::mcaffinity to set it, e.g.:

parallel::mcaffinity(1:20)

to allow your R process to use only the first 20 cores. This works regardless of other libraries used inside this process, because it invokes OS-level control over resources (some rare libraries spawn or communicate with other processes, but your code doesn't seem to use anything like that).

%:% is the right way to nest foreach loops — the foreach package will consider both inner and outer loop in its scheduling, and execute only registerDoParallel inner bodies at a time — whether they are from the same outer loop iteration or not. The wrong way would be e.g. foreach(…) %dopar% { foreach(…) %dopar% { … } } — this would spawn registerDoParallel-squared number of computations at a time (so, 400 in your case). foreach(…) %do% { foreach(…) %dopar% { … } } (or the other way around) would be better, but suboptimal. See the foreach's nesting vignette for details.

In your case it'd probably be the best to keep the two outer loops as they are now (%:% and %doPar%), and change the inner loop to %do%. You still have quite a lot of iterations total in the two outer loops to fill 20 cores, and the common rule is that it's better to parallelize outer loops than inner, if it's possible.

liori
  • 40,917
  • 13
  • 78
  • 105
2

With many experiments, I'm guessing this is how foreach() might fork the workers:

  1. if nested foreach used (e.g. foreach() %:% foreach() %dopar% {} ): the workers (logical CPU cores which share storage) forked will be the cores registered before foreach() multiplying times of foreach(). E.g.:

    registerDoMc(cores = 10)
    foreach() %:% foreach() %:% foreach() %dopar% {} # 10x3 = 30 workers will be finally forked in the following example.
    
  2. If a foreach() nested in another foreach() without using %:%, the workers (logical CUP cores) forked will be the cores registered from the %:% part multiplying the independent nested part. E.g.:

    registerDoMc(cores = 10)
    foreach() %:% foreach() %dopar% { foreach()} # (10+10)x10 = 200 workers will the finally forked.
    

Welcome any corrections if wrong.

CcMango
  • 377
  • 1
  • 4
  • 15
0

I don't know if this is possible but maybe you could try to reduce the server priority by running it with the "nice" command (so that, even if it's using 100% CPU, it will only be taken on idle time) ?

Camion
  • 1,264
  • 9
  • 22