0

I've resampled with replacement a data set 1000 times and now want to fit three models to each of these 1000 data sets and bag their AIC scores. The end goal of this procedure is to obtain mean AIC scores for each model across all models along with their 95% confidence intervals. The code below is faulty and I don't know where I am making a mistake. What happens is that the final matrix only contains the AIC score vectors from the first few iterations (i.e. not all 1000). Is there an error in the way I initialized the main matrix or the vectors at each iteration? Or maybe my row-adding procedure is flawed? Or, if the code is correct, could it be something with the datasets that are being fed into this code? If the latter is the case, then why am I not getting an error when the code reads in these datasets and just skips them? I've been struggling with this for days and am very puzzled, so any help would be appreciated.

require(lme4)
require(lmerTest)

# initializing an empty matrix for storing each vector of AIC scores from each iteration
# the matrix has width 3 because three models are fitted at each iteration
AIC.scores = data.frame(matrix(, nrow = 0, ncol = 3))

#fit regression models to each of 1000 datasets
for(iter in 1:1000){

  #retrieving the data set, named accordingly, for the current iteration
  data = read.csv(paste("data_set_", iter,".csv", sep=""), header=TRUE)

  #initializing vector of AICs from models in current iteration
  AIC.score = vector(mode="numeric", length=3)

  mod1 = lmer(RT.log ~ crit.var1.log.std +
                       (1|Subject) +
                       (1|Item),
                          data = data,
                          REML=FALSE)

  AIC.score[1] = summary(mod1)$AIC[1]

  mod2 = lmer(RT.log ~ crit.var2.log.std +
                       (1|Subject) +
                       (1|Item),
                          data = data,
                          REML=FALSE)

  AIC.score[2] = summary(mod2)$AIC[1]

  mod3 = lmer(RT.log ~ crit.var3.log.std +
                       (1|Subject) +
                       (1|Item),
                          data = data,
                          REML=FALSE)

  AIC.score[3] = summary(mod3)$AIC[1]

  #adding vector of AICs scores from current iteration to main matrix
  AIC.scores = rbind(AIC.scores, t(AIC.score))

  cat("bagging iteration", iter, "completed!\n")

 }

#renaming column names in AIC score matrix
colnames(AIC.scores) = c("model1", "model2", "model3")

# function for calculating mean AIC and 95% C.I.s for each model across all  iterations
norm.interval = function(data, z=1.96) {
  mean = mean(data)
  variance = var(data)
  sd = sqrt(variance/length(data))
  c(mean, mean - z * sd, mean + z * sd)
}

for (i in 1:3) {

  cat("The mean, lCI, uCI for model", i, "are:", norm.interval(AIC.scores[,i]), "\n")

}
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Des Grieux
  • 520
  • 1
  • 5
  • 31

1 Answers1

1

Shot in the dark without knowing specifically what your lmer models are or what you data is.

Read in all your data.frame as a single list:

all.data <- lapply(paste0("data_set_", 1:1000, ".csv"), read.csv, header=TRUE)

Churn out the lmer formulae in text form, based on the list of data above:

all.form <- lapply(paste0("all.data[[", 1:1000, "]]"), function(x) list(
  mod1 = paste0("lmer(RT.log ~ crit.var1.log.std +  (1|Subject) + (1|Item), REML=FALSE, data =", x, ")"),
  mod2 = paste0("lmer(RT.log ~ crit.var2.log.std +  (1|Subject) + (1|Item), REML=FALSE, data =", x, ")"),
  mod3 = paste0("lmer(RT.log ~ crit.var3.log.std +  (1|Subject) + (1|Item), REML=FALSE, data =", x, ")")
  )) 

Execute the lmer formula and write down the models into a single list:

all.lmer.mod <- lapply(all.form. function(x) lapply(x, function(y) eval(parse(text=y))))

Extract all AIC value of lmer models:

all.AIC <- lapply(all.lmer.mod, function(x) lapply(x, AIC))

Be warned that the process will take long if you have large data.frame and you have many subjects and items level. Test it out by changing 1:1000 on top to say 1:2 first.

Adam Quek
  • 6,973
  • 1
  • 17
  • 23
  • Thanks, Adam! The code looks very elegant, although it seems it will take longer since you have two separate loops compared to one in my code. How is your code an improvement on mine? It would be helpful to understand what aspects of my code it is improving on, so that I can identify the mistake. I know it may not be possible because, as you point out, the data is not there... – Des Grieux Apr 18 '17 at 03:49
  • It's an improvement in the sense that you evaluate and keep all the model object in a nested list, i.e. `all.lmer.mod[[1]]` will contain 3 objects for the evaluated mod1 to 3 using data_set_1. This allow you to go back to this object directly to look for AIC value, BIC value, plot, predict, etc. E.g. if you want to find the AIC of the first model for data_set_1, you run `AIC(all.lmer.mod[[1]][1])`. This is the option that you will not have if everything is build into a `for-loop`. – Adam Quek Apr 18 '17 at 03:54
  • Reason why this is an option you do not have in the `for-loop` is that when `iter` change from 1 to 2, your `mod1` will change from mod1 of iter1 to mod 1 of iter2. – Adam Quek Apr 18 '17 at 03:55