0

I am using the package "TuneRanger" to tune a RF model. It works good and I obtained good results but I am not sure if it is overfitting my model. I would like to use a Repeated CV for every instance the package is tuning the model but I can't find a way to do it. Also I would like to know if anybody knows how the package validates the results of every try (train-test, cv, repeated cv?) I have been reading the instructions of the package (https://cran.r-project.org/web/packages/tuneRanger/tuneRanger.pdf) but it says nothing about it.

Thank you for your help.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Deco1998
  • 101
  • 6

1 Answers1

1

Out of bag estimates are used for estimating the error, I don't think you can switch to CV using that package. It's up to you to decide whether CV is better than this. In their readme, they linked to a publication, and under it section 3.5 they wrote:

Out-of-bag predictions are used for evaluation, which makes it much faster than other packages that use evaluation strategies such as cross-validation

If you want to use cross-validation or repeated cross-validation, you would have to use caret, for example:

library(caret)

mdl = train(Species ~ .,data=iris,method="ranger",trControl=trainControl(method="repeatedcv",repeats=2),
tuneGrid = expand.grid(mtry=2:3,min.node.size = 1:2,splitrule="gini"))

Random Forest 

150 samples
  4 predictor
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters:

  mtry  min.node.size  Accuracy  Kappa
  2     1              0.96      0.94 
  2     2              0.96      0.94 
  3     1              0.96      0.94 
  3     2              0.96      0.94 

Tuning parameter 'splitrule' was held constant at a value of gini
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini
 and min.node.size = 1.

The parameters you can tune will be different. I think mlr also allows you to perform cross-validation but the same limitations apply.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72