1

I'm building a Random Forest with Caret package on R with method = "rf". I see that every type of random forest on caret seems only tune mtry which is the number of features selected randomly for each tree. I do not understand why max_depth of each tree is not a tunable parameter (like cart) ? In my mind, it is a parameter which can limit over-fitting. For example, my rf seems really better on train data than the test data :

model <- train(
          group ~., data = train.data, method = "rf",
          trControl = trainControl("repeatedcv", number = 5,repeats =10),
          tuneLength=5
        )


>         postResample(fitted(model),train.data$group)
Accuracy    Kappa 
0.9574592 0.9745841 

>         postResample(predict(model,test.data),test.data$group)
 Accuracy     Kappa 
0.7333333 0.5428571 

As you can see my model is clearly over-fitted. However, I tried a lot of different things to handle this but nothing worked. I always have something like 0.7 accuracy on test data and 0.95 on train data. This is why I want to optimize other parameters.

I cannot share my data to reproduce this.

0 Answers0