I am comparing the properties of a new automated optimal factor/variable or more correctly, an optimal model selection technique to two or three standard benchmarks. For those Benchmark methods, we have decided to go with LASSO as the 1st and Backwards Elimination Stepwise Regression as the 2nd, but just out of curiosity, I decided to also try to run a Forward Selection Stepwise Regression on our 47,501 synthetic datasets created for the Monte Carlo Simulation underneath the Benchmark comparisons. I have already successfully run my LASSO and my BE Stepwise Regressions iteratively on all 47k datasets.
So I thought just altering a bit of the BE Stepwise code would be enough to get a FS Stepwise to work as well, but the model it fits for each dataset is the same, namely, just the intercept! All of the code/R scripts and the much smaller practice version of the file folder containing all of the individual (csv formatted) datasets can be found on my GitHub repository for this research project.
Here is the code I have already used successfully to run my BE Stepwise Regressions:
directory_path <- "~/DAEN_698/other datasets/sample_obs2"
filepath_list <- list.files(path = directory_path, full.names = TRUE,
recursive = TRUE)
# reformat the names of each of the csv file formatted datasets
DS_names_list <- basename(filepath_list)
DS_names_list <- tools::file_path_sans_ext(DS_names_list)
## This line reads all of the data in each of the csv files
## using the name of each store in the list we just created.
datasets <- lapply(list.files(path = "~/DAEN_698/other datasets/sample_obs2",
full.names = TRUE, recursive = TRUE), read.csv)
### Step 3: Run a Backward Elimination Stepwise Regression
### function on each of the 47,500 datasets.
set.seed(11) # for reproducibility
full_model <- vector("list", length = length(datasets))
BE_fits <- vector("list", length = length(datasets))
BE_fits # returns a list with 15 elements, all of which are NULL
set.seed(11) # for reproducibility
for(i in seq_along(datasets)) {
full_model[[i]] <- lm(formula = Y ~ ., data = datasets[[i]])
BE_fits[[i]] <- step(object = full_model[[i]], scope = formula(full_model[[i]]),
direction = 'backward', trace = 0) }
Therefore, I figured all I would need to do after already having all of this working to do a FS Stepwise would be the following:
### Step 4/5 (optional): Run a Forward Selection Stepwise Regression
### function on each of the 47,500 datasets.
### Assign the null models to their corresponding datasets and
### store these in the object "null_models"
set.seed(11) # for reproducibility
#datasets[[1]]$
# try this line below if the line of code after it does not run/work
#null_model = lm(datasets[[1]]$Y ~ 1, data = datasets)
null_models <- vector("list", length = length(datasets))
FS_fits <- vector("list", length = length(datasets))
FS_fits # returns a list with 15 elements, all of which are NULL
set.seed(11) # for reproducibility
for(j in seq_along(datasets)) {
null_models[[j]] <- lm(formula = Y ~ 1, data = datasets[[j]])
FS_fits[[j]] <- step(object = null_models[[j]],
scope = formula(null_models[[j]]),
direction = 'forward',
trace = 0) }
However, when I ran that and checked my work, I got the following:
> head(null_models, n = 1)
[[1]]
Call:
lm(formula = Y ~ 1, data = datasets[[j]])
Coefficients:
(Intercept)
1.017
> head(FS_fits, n = 1)
[[1]]
Call:
lm(formula = Y ~ 1, data = datasets[[j]])
Coefficients:
(Intercept)
1.017
> names(coef(FS_fits[[1]]))
[1] "(Intercept)"
> names(coef(FS_fits[[2]]))
[1] "(Intercept)"
> names(coef(FS_fits[[3]]))
[1] "(Intercept)"