0

I am running a model on an AWS instance with 36 cores. Dummy data example:

library(mlbench)
data(Sonar)
library(caret)
library(doParallel)
set.seed(95014)

# create training & testing data sets

inTraining <- createDataPartition(Sonar$Class, p = .75, list=FALSE)
training <- Sonar[inTraining,]
testing <- Sonar[-inTraining,]

# set up training run for x / y syntax because model format performs poorly
x <- training[,-61]
y <- training[,61]

cl <- makePSOCKcluster(36)
registerDoParallel(cl) 
fitControl <- trainControl(method = "cv",
                           number = 5,
                           allowParallel = TRUE)
fit <- train(x,y, method="cforest",data=Sonar,trControl = fitControl)
stopCluster(cl)

But when I look at in htop, only half of the cores are calculating. Is there a core restriction in doParallel or caret?

enter image description here

phiver
  • 23,048
  • 14
  • 44
  • 56
Hanjo Odendaal
  • 1,395
  • 2
  • 13
  • 32
  • 1
    try to see what happens when you use method "ranger". ranger runs natively in parallel, cforest doesn't. Might be that there are not enough processes that can run in parallel, your cv is only 5 so that is also a limiting factor. – phiver Jun 13 '18 at 07:54
  • @phiver, switching to ranger and increasing folds to 10 has worked well! Its a pity about `cforest` though. Anyway one could get around the restrictions, or just find out more why it would limit the cores used? – Hanjo Odendaal Jun 13 '18 at 08:24
  • 1
    well, looking at it, you had 5 fold cv, caret runs a default tunelength of 3 if you do not specify a tunegrid, so 5 * 3 = 15 cores. Which is exactly what your htop is showing. and cforest itself doesn't run in parallel so there is your bottleneck. If you want to make full use of your cores and cforest, you could use a tunegrid for mtry, e.g. 4 values and increase the cv to 9 or mtry with 6 values and cv at 5 to and you would use all (or in the second case, almost all) the cores. Of course using all the cores is nice, but shouldn't be a goal. – phiver Jun 13 '18 at 09:40

0 Answers0