0

I am trying to run random forest model with crossvalidation after using preprocess with PCA with caret package. I am predicting two classes (variable dg) using 381 parameters and I have 100 observations.

I was expecting that after preprocessing the model will work with principal components only, but when I assess model result I got mtry values for 2, 191 and 381 variables.

Creation of the model:

cntrl <- trainControl(method="repeatedcv", 
                       number=10,
                       repeats = 100,
                       returnResamp="all",
                       savePredictions="all",
                       preProcOptions =list(thresh = 0.8), #80% of variance explained
                       classProbs=TRUE, 
                       summaryFunction=twoClassSummary
)

rf_mod <- train(dg ~ .,
                 data = training,
                 method = "rf",
                 trControl = cntrl,
                 preProcess = c("pca"),
                 metric = "ROC")

Results of model:

 mtry  ROC        Sens     Spec     
    2   0.7770833  0.59750  0.8291667
  191   0.7776042  0.60250  0.8141667
  381   0.7765625  0.60375  0.8183333

What is the rerason of this output? Why all predictors are included?

Matyas K.
  • 1
  • 1

0 Answers0