2

I'm tuning some random forest models using ranger in tidymodels. I have a fairly large dataset with many columns. As a result, I set up a digital ocean droplet for tuning/trainng using instructions from Danny Foster's article: R on Digital Ocean. The system I'm using to train my models is running on Intel hardware and has 32 cores and 64gb of ram.

enter image description here

I use the following command before tuning to achieve parallel processing:

doParallel::registerDoParallel(cores=28)
set.seed(345)

tune_res <- tune_grid(
  tune_wf,
  resamples = figs_folds,
  grid = 10
)

When I use the htop command to view processing on the system I see that 17 to 18 of the 32 cores are consistently being utilized. This number seems consistent the DoParallel documentation that states 50% of the cores are utilized if cores are not specified. I figured 2 core are set aside for other duties and my model is using the 16.

Therefore it seems like something must be wrong with how I am specifying the number of cores to use.

doParallel::registerDoParallel(cores=28)

How should I specify the number of cores to use in DoParallel.

EDIT UPDATE: When I switched from tuning a random forest model to training a logistic regression. All of the cores and much more of the available memory were utilized. Here is the htop image from the logistic regression model training:

enter image description here

As you can see all of the cores are utilized and approximately 46 GB of RAM. Is this an indication that different models can utilize cores in different ways.

Mutuelinvestor
  • 3,384
  • 10
  • 44
  • 75
  • Do you have a gist or some other place to show the code? A lot depends on how you are doing things and the code here is a bit minimal. – topepo Apr 15 '21 at 14:27

0 Answers0