The goal is to do multiple logit regressions on different training data frames and then to evaluate the performance on respective test data sets.
First, we create multiple (5) randomly sampled data frames from df
:
for(i in 1:5) {
nr <- paste("random_df", i, sep = "_")
assign(nr, random_df[sample(nrow(df)),])
}
Then, we create indicators for the separation into training- and test set:
train <- 1:(length(df$y)*0.8)
test <- !(1:nrow(df) %in% train)
Now we'd like to loop a logit regression over each training data frame. Herein lies the first problem as we're only able to create the output into a matrix or list. We create alternatively a list, data frame or matrix of the random samples:
lr_list <- list(random_df_1,random_df_2,random_df_3,random_df_4,random_df_5)
Then, we loop the logit regressions over all data frames within the list:
for(i in 1:5) {
index <- paste("lr_train", i, sep = "_")
assign(index, lapply(lr_list, function(x) {glm(y ~ x1 + x2, data=x,
subset=train, family=binomial)}))
}
Here lapply
creates a list and sapply
a matrix of each estimation result.
The goal is to get an output in the form of a glm
object in order to conduct predictions using each train glm and thus be able to evaluate the model performance across different test-/train data constellations:
lr_test_1 <- predict(lr_train_1, random_df_1[test, ], type="response")
Any help is very appreciated.