I'm using varImp
function from R
package caret
to get importance of variables. This is my code:
library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
search = "grid",summaryFunction = youdenSumary)
classifier = train(form = Target ~ ., data = training_set, method = 'rpart',
parms = list(split = "information"),trControl=trctrl,
tuneLength = 10,metric = "j")
importance <- varImp(classifier, scale=FALSE)
This is the resulting variables importance:
rpart variable importance
Overall
nh 532.218
nRT 488.922
wdSu 482.582
av_t 390.266
nc 317.725
o 303.738
dt 291.488
wdMo 103.200
wdSa 49.690
ne 46.707
wdWe 41.642
nl 26.463
wdTu 9.506
wdTh 2.669
The code runs the recursive partitioning algorithm and keep track of how much each split reduces the loss function. But... what is the loss function in this case? The Rdocumentation says:
The reduction in the loss function (e.g. mean squared error) attributed to each variable at each split is tabulated and the sum is returned. Also, since there may be candidate variables that are important but are not used in a split, the top competing variables are also tabulated at each split. This can be turned off using the maxcompete argument in rpart.control. This method does not currently provide class-specific measures of importance when the response is a factor.
It mentions the mean squared error. Is this the loss function used in this package (I'm not sure about that "e.g." in round brackets)?