3

I am trying to do model selection and want to retrieve mean RMSE from 10-fold cross validation. for some models it is possible to use the train() function from the caret package, however for other models I want to look at I have found a manual way to do k-fold cross validation here: https://www.r-bloggers.com/2016/06/bootstrap-and-cross-validation-for-evaluating-modelling-strategies/

The RMSE for cross-validated models are however more different than I would expect. below is the code where I apply different methods of retrieving the RMSE on the same model

library(caret)
library(datasets)

exd<-warpbreaks

##k-fold cross validation
Repeats <- 100
cv_repeat_num <- Repeats / 10

the_control <- trainControl(method = "repeatedcv", number = 10, repeats = cv_repeat_num)
cv_ex <- train(breaks~wool+tension, data = exd, method = "glm",family= "poisson", trControl = the_control)

m_ex <- glm(data = exd, breaks~wool+tension, family = "poisson")
results <- numeric(10 * cv_repeat_num)
for(j in 0:(cv_repeat_num - 1)){
  cv_group <- sample(1:10, nrow(exd), replace = TRUE)
  for(i in 1:10){
    train_data <- exd[cv_group != i, ]
    test_data <- exd[cv_group == i, ]
    m_ex <- update(m_ex, data = train_data)
    results[j * 10 + i] <- RMSE(
      predict(m_ex, newdata = test_data),
      test_data$breaks)
  }
}
#RMSE from manual cross validation
mean(results)
#RMSE from RMSE function, no cross validation
RMSE(predict(m_ex, exd), exd$breaks)
#RMSE from train function
mean(cv_ex$resample$RMSE)

This difference in RMSE does not occur when I use a simple linear model instead of a poisson model as in this example. can someone shed some light on why this is? and if I could simply use the manual approach I found?

phiver
  • 23,048
  • 14
  • 44
  • 56
EllyJ
  • 31
  • 1

0 Answers0