I created a list of linear models after performing k-fold cross validation. When I then use
map(modList, ~ stepwise(., direction = "backward",criterion = "AIC"))
I get this error:
Error in stepAIC(mod, scope = list(lower = lower, upper = upper), direction = direction, : number of rows in use has changed: remove missing values?
I found advice [here][1] and although I didn't have any NA values to begin with, I used na.omit()
when I created each model:
mod <- lm(z_mara ~ . -1, data = na.omit(trainCV_sets[[i]]))
What's curious is that if I use stepwise()
on each individual model within my modList
, I only get the error on the first through third models:
stepwise(modList[[1]], direction = "backward", criterion = "AIC")
The fourth model in the list work just fine (e.g change [[1]]
to [[4]]
). The only difference I've found so far between the models that do not work and those that do is that the number of rows used to create the 1st-3rd models are 11 while has 12 rows. I tried changing the number of k-folds from 4 to 2 and I was able to reproduce a similar issue where stepwise()
produces the error for some, but not all models.
I expected that the rows wouldn't matter, and I'm currently stuck on what to do next. I don't know of a way to avoid having unequal numbers of rows after cross validation because the data won't usually be divisible by whatever number of folds I choose. Thanks!
EDIT Below is a test dataset and code you can use:
training_pca <- tibble(z_mara = c(0.875112399579427, 1.13894237770963,
0.981180086218575, 1.8299894270253, -0.0977104233331723, -0.86241558837047,
-1.40078945237221, 0.030856469562766, -0.0969068802525719, -0.223063143906711,
-0.339309042900122, -0.760633464827852, -1.01830294600679, -1.29686454728133,
-1.39007554463088), PC1 = c(2.50796906215996, 2.68002216940135,
-3.71526007848766, -4.1866795459591, 0.0486434530855795, 0.938304238296873,
0.615097559709769, 4.51822579066407, 0.0667126218105262, -0.287378553280471,
4.56603423346247, -0.933301952142784, -0.216058021757341, 0.0694218190496208,
9.2292656007433), PC2 = c(2.17078441684463, -0.199721476838284,
2.11735035612631, 3.03544498625801, -2.40626489140099, 2.35791076587204,
-1.25582416436226, 1.05994935645006, -0.527916055713799, 0.070729476106444,
-0.132485078109658, 0.949419002097709, -5.09998831832504, 1.0619915074115,
2.3100848754419), PC3 = c(-0.678144405424994, 1.09058577380406,
-1.0176297169444, -1.88552787304444, -1.25338902691145, 0.210499981706639,
0.04477690001186, -1.38969813324165, -0.804920509008458, 0.879138015029564,
1.29938967213637, 0.138188073442918, -5.26695126718185, -2.14797795807358,
1.62335052575001), PC4 = c(-2.3887668089524, 3.97903517088607,
0.867800635964586, -2.03442689569174, 0.13869948487601, -0.0664645238981098,
0.670700717048399, 3.11260554952338, 0.408133658723152, 3.44132931097882,
1.10549707214258, 1.0954984094734, -0.49754510472686, -2.85815085785707,
2.04309500230981), PC5 = c(0.821714377740757, 3.78554017369849,
0.230169453679477, -1.9039365507055, 0.0828762592235243, -1.83817906200712,
-2.4669972669382, 1.88139312797292, -1.28458814157848, -1.74987295529079,
0.342015969239523, -0.819381168596516, 1.64847329083668, -3.95311876523071,
-0.306975624219868))
set.seed(3654)
trainingCV_pca <- crossv_kfold(training_pca, k = 4)
trainCV_sets <- vector(mode = "list", length = nrow(trainingCV_pca))
testCV_sets <- vector(mode = "list", length = nrow(trainingCV_pca))
for(i in 1:nrow(trainingCV_pca)) {
trainCV_set_name <- paste("trainCV", i, sep = "")
trainCV_idx <- trainingCV_pca[[1]][[i]][[2]]
trainCV_set <- training_pca[trainCV_idx,]
assign(trainCV_set_name, trainCV_set)
trainCV_sets[[i]] <- trainCV_set
testCV_set_name <- paste("testCV", i, sep = "")
testCV_set <- training_pca[-trainCV_idx,]
assign(testCV_set_name, testCV_set)
testCV_sets[[i]] <- testCV_set
}
modList <- vector(mode = "list", length = nrow(trainingCV_pca))
for(i in 1:nrow(trainingCV_pca)) {
train_mod_name <- paste("fullTrainMod", i, sep = "")
mod <- lm(z_mara ~ . -1, data = trainCV_sets[[i]]) # centered and scaled so no intercept
modList[[i]] <- assign(train_mod_name, mod)
}
map(modList, ~ stepwise(., direction = "backward",criterion = "AIC"))```
[1]: https://stackoverflow.com/questions/11819472/why-does-the-number-of-rows-change-during-aic-in-r-how-to-ensure-that-this-does
[2]: https://github.umn.edu/hesse151/Marathon-Machine-Learning/blob/master/code/pre-reg/pre-reg_regress_pca.R
[3]: https://github.umn.edu/hesse151/Marathon-Machine-Learning/blob/master/data/processed/pca_training.csv