I'm tuning some random forest models using ranger in tidymodels. I have a fairly large dataset with many columns. As a result, I set up a digital ocean droplet for tuning/trainng using instructions from Danny Foster's article: R on Digital Ocean. The system I'm using to train my models is running on Intel hardware and has 32 cores and 64gb of ram.
I use the following command before tuning to achieve parallel processing:
doParallel::registerDoParallel(cores=28)
set.seed(345)
tune_res <- tune_grid(
tune_wf,
resamples = figs_folds,
grid = 10
)
When I use the htop command to view processing on the system I see that 17 to 18 of the 32 cores are consistently being utilized. This number seems consistent the DoParallel documentation that states 50% of the cores are utilized if cores are not specified. I figured 2 core are set aside for other duties and my model is using the 16.
Therefore it seems like something must be wrong with how I am specifying the number of cores to use.
doParallel::registerDoParallel(cores=28)
How should I specify the number of cores to use in DoParallel.
EDIT UPDATE: When I switched from tuning a random forest model to training a logistic regression. All of the cores and much more of the available memory were utilized. Here is the htop image from the logistic regression model training:
As you can see all of the cores are utilized and approximately 46 GB of RAM. Is this an indication that different models can utilize cores in different ways.