1) I tried regression Random Forest for training set of 185 rows with 4 independent variables. 2 categorical variables have each of 3 levels and 13 levels. Another 2 variables are numeric continuous variables.
I tried RF with cross validation of 10 fold repeated 4 times. (I didn't scale dependent variable and that's why RMSE is so big.)
I guess the reason mtry is bigger than 4 is that the categorical variables has 3+13= 16 levels total. But if so, why it does not include the numeric variables number?
185 samples
4 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 4 times)
Summary of sample sizes: 168, 165, 166, 167, 166, 167, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 16764183 0.7843863 9267902
9 9451598 0.8615202 3977457
16 9639984 0.8586409 3813891
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 9.
Please help me on understanding mtry.
2) Also, each fold sample size is 168,165,166,...., and why the sample size is changing?
sample sizes: 168, 165, 166, 167, 166, 167
Thank you so much.