All of the code in this question can be found in my GitHub Repository for this research project on Estimated Exhaustive Regression. Specifically, in the "Both BE & FS script" and "LASSO code" Rscripts, and you may use the significantly truncated file folder of datasets "sample_obs(20)" rather than "spencer" because the former only contains 20 csvs while the latter contains 58.5k!
I am running both a Backward Elimination and a Forward Selection Stepwise Regression on each of N different csv file formatted datasets within a file folder using the following code (once the N datasets have already been loaded):
set.seed(11) # for reproducibility
full_models <- vector("list", length = length(datasets))
BE_fits <- vector("list", length = length(datasets))
head(BE_fits, n = 3) # returns a list with 18 elements, all of which are NULL
set.seed(11) # for reproducibility
for(i in seq_along(datasets)) {
full_models[[i]] <- lm(formula = Y ~ ., data = datasets[[i]])
BE_fits[[i]] <- step(object = full_models[[i]],
scope = formula(full_models[[i]]),
direction = 'backward',
trace = 0) }
And to get the final results I want, I use the following:
BE_Coeffs <- lapply(seq_along(BE_fits), function(i) coef(BE_fits[[i]]))
Models_Selected_by_BE <- lapply(seq_along(BE_fits),
\(i) names(coef(BE_fits[[i]])))
And for FS Stepwise, I used:
set.seed(11) # for reproducibility
FS_fits <- vector("list", length = length(datasets))
head(FS_fits, n = 3) # returns a list with 15 elements, all of which are NULL
set.seed(11) # for reproducibility
for(j in seq_along(datasets)) { null_models[[j]] = lm(formula = Y ~ 1,
data = datasets[[j]])
FS_fits[[j]] = step(object = null_models[[j]],
direction = 'forward',
scope = formula(full_models[[j]]), trace = 0) }
Much of the syntax of this code I got from previous questions I asked here several months ago, but now I am rerunning all of my models on a new file folder filled with new randomly generated synthetic datasets, and I don't want to re-run this using this code because last time, it took WELL OVER 12 or 14 hours for both the BE and the FS stepwise procedures to finish running.
p.s. I already was able to avoid using a loop when I did the same thing instead for LASSO Regression as my 1st Benchmark Variable Selection Procedure using the following code which employed a function from R's useful apply family (this only takes 2-3 hours):
set.seed(11) # to ensure replicability
LASSO_fits <- lapply(datasets, function(i)
enet(x = as.matrix(select(i, starts_with("X"))),
y = i$Y, lambda = 0, normalize = FALSE))
However, I could not figure out how to replicate something similar for either basic version of Stepwise because of the all important initialization step beforehand.