As requested in the comments here is a custom function to evaluate the cross validation test error. I am not sure if it can be extracted from the caret train object.
After running the caret train extract the folds for the best tune:
library(tidyverse)
model$bestTune %>%
left_join(model$pred) %>%
select(rowIndex, Resample) %>%
mutate(Resample = as.numeric(gsub(".*(\\d$)", "\\1", Resample)),
Resample = ifelse(Resample == 0, 10, Resample)) %>%
arrange(rowIndex) -> resamples
Construct a cross validation function that will use the same folds as caret did:
library(xgboost)
train <- my_data[,!names(my_data)%in% "Class"]
label <- as.numeric(my_data$Class) - 1
test_auc <- lapply(1:10, function(x){
model <- xgboost(data = data.matrix(train[resamples[,2] != x,]),
label = label[resamples[,2] != x],
nrounds = model$bestTune$nrounds,
max_depth = model$bestTune$max_depth,
gamma = model$bestTune$gamma,
colsample_bytree = model$bestTune$colsample_bytree,
objective = "binary:logistic",
eval_metric= "auc" ,
print_every_n = 50)
preds_train <- predict(model, data.matrix(train[resamples[,2] != x,]))
preds_test <- predict(model, data.matrix(train[resamples[,2] == x,]))
auc_train <- pROC::auc(pROC::roc(response = label[resamples[,2] != x], predictor = preds_train, levels = c(0, 1)))
auc_test <- pROC::auc(pROC::roc(response = label[resamples[,2] == x], predictor = preds_test, levels = c(0, 1)))
return(data.frame(fold = unique(resamples[resamples[,2] == x, 2]), auc_train, auc_test))
})
do.call(rbind, test_auc)
#output
fold auc_train auc_test
1 1 1 0.9909091
2 2 1 0.9797980
3 3 1 0.9090909
4 4 1 0.9629630
5 5 1 0.9363636
6 6 1 0.9363636
7 7 1 0.9181818
8 8 1 0.9636364
9 9 1 0.9818182
10 10 1 0.8888889
arrange(model$resample, Resample)
#output
ROC Sens Spec Resample
1 0.9909091 1.0000000 0.8000000 Fold01
2 0.9898990 0.9090909 0.8888889 Fold02
3 0.9909091 0.9090909 1.0000000 Fold03
4 0.9444444 0.8333333 0.8888889 Fold04
5 0.9545455 0.9090909 0.8000000 Fold05
6 0.9272727 1.0000000 0.7000000 Fold06
7 0.9181818 0.9090909 0.9000000 Fold07
8 0.9454545 0.9090909 0.8000000 Fold08
9 0.9909091 0.9090909 0.9000000 Fold09
10 0.8888889 0.9090909 0.7777778 Fold10
Why the test fold AUC from my function and caret are not the same I can not say. I am fairly sure the same parameters and folds were used. I can assume it has to do with the random seed. When I check the auc of caret test predictions I get the same output as caret:
model$bestTune %>%
left_join(model$pred) %>%
arrange(rowIndex) %>%
select(M, Resample, obs) %>%
mutate(Resample = as.numeric(gsub(".*(\\d$)", "\\1", Resample)),
Resample = ifelse(Resample == 0, 10, Resample),
obs = as.numeric(obs) - 1) %>%
group_by(Resample) %>%
do(auc = as.vector(pROC::auc(pROC::roc(response = .$obs, predictor = .$M)))) %>%
unnest()
#output
Resample auc
<dbl> <dbl>
1 1.00 0.991
2 2.00 0.990
3 3.00 0.991
4 4.00 0.944
5 5.00 0.955
6 6.00 0.927
7 7.00 0.918
8 8.00 0.945
9 9.00 0.991
10 10.0 0.889
But again I emphasize test error will tell you little and you should rely on train error. If you would like to bring the two closer than consider fiddling with gamma
, alpha
and lambda
parameters.
With a small data set I would still try to split train : test = 80 : 20 and use that independent test set to verify if the CV error is close to test error.