How would I vectorize this loop? When I have the loop with the backward stepwise regression, it takes over 15 minutes to run through the regression. (My full dataset has over 4000 observations and 20+ independent variables.) Any idea how I would vectorize this? I'm new to the whole concept.
I've looked into making this a function, and then using an ifelse statement for the training and validation. But, I haven't been able to get this to work in the code. Any ideas?
Here is a small dataset:
name <- c("Joe I.", "Joe I.", "Joe I.", "Joe I.", "Jane P.", "Jane P.", "Jane P.", "Jane P.",
"John K.", "John K.", "John K.", "John K.")
name_id <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
grade <- c(80, 99, 70, 65, 88, 90, 76, 65, 67, 68, 89, 67)
score <- c(82, 93, 72, 61, 89, 93, 71, 63, 64, 65, 82, 62)
attendance <- c(80, 99, 82, 62, 70, 65, 88, 90, 76, 93, 71, 99)
participation <- c(71, 63, 64, 71, 99, 76, 65, 67, 93, 72, 68, 89)
df <- cbind(name, name_id, class, grade, score, attendance, participation)
df <- as.data.frame(df)
df$name_id <- as.numeric(df$name_id)
df$grade <- as.numeric(df$grade)
df$score <- as.numeric(df$score)
df$attendance <- as.numeric(df$attendance)
df$participation <- as.numeric(df$participation)
Here is the loop:
magic_for(print, silent = TRUE)
for(i in 1:3){
validation = df[df$name_id == (i),]
training = df[df$name_id != (i),]
m = lm(score ~ grade + attendance, participation, data = training)
stepm <- stepAIC(m, direction = "backward", trace = FALSE)
pred1 <- predict(stepm, validation)
print(pred1)
}
options(max.print=999999)
pred1 <- magic_result_as_dataframe()