I am trying to run random forest model with crossvalidation after using preprocess with PCA with caret package. I am predicting two classes (variable dg) using 381 parameters and I have 100 observations.
I was expecting that after preprocessing the model will work with principal components only, but when I assess model result I got mtry values for 2, 191 and 381 variables.
Creation of the model:
cntrl <- trainControl(method="repeatedcv",
number=10,
repeats = 100,
returnResamp="all",
savePredictions="all",
preProcOptions =list(thresh = 0.8), #80% of variance explained
classProbs=TRUE,
summaryFunction=twoClassSummary
)
rf_mod <- train(dg ~ .,
data = training,
method = "rf",
trControl = cntrl,
preProcess = c("pca"),
metric = "ROC")
Results of model:
mtry ROC Sens Spec
2 0.7770833 0.59750 0.8291667
191 0.7776042 0.60250 0.8141667
381 0.7765625 0.60375 0.8183333
What is the rerason of this output? Why all predictors are included?