0

I am comparing the properties of a new automated optimal factor/variable or more correctly, an optimal model selection technique to two or three standard benchmarks. For those Benchmark methods, we have decided to go with LASSO as the 1st and Backwards Elimination Stepwise Regression as the 2nd, but just out of curiosity, I decided to also try to run a Forward Selection Stepwise Regression on our 47,501 synthetic datasets created for the Monte Carlo Simulation underneath the Benchmark comparisons. I have already successfully run my LASSO and my BE Stepwise Regressions iteratively on all 47k datasets.

So I thought just altering a bit of the BE Stepwise code would be enough to get a FS Stepwise to work as well, but the model it fits for each dataset is the same, namely, just the intercept! All of the code/R scripts and the much smaller practice version of the file folder containing all of the individual (csv formatted) datasets can be found on my GitHub repository for this research project.

Here is the code I have already used successfully to run my BE Stepwise Regressions:

directory_path <- "~/DAEN_698/other datasets/sample_obs2"
filepath_list <- list.files(path = directory_path, full.names = TRUE, 
                        recursive = TRUE)

# reformat the names of each of the csv file formatted datasets
DS_names_list <- basename(filepath_list)
DS_names_list <- tools::file_path_sans_ext(DS_names_list)

## This line reads all of the data in each of the csv files 
## using the name of each store in the list we just created.
datasets <- lapply(list.files(path = "~/DAEN_698/other datasets/sample_obs2", 
                              full.names = TRUE, recursive = TRUE), read.csv)

### Step 3: Run a Backward Elimination Stepwise Regression
### function on each of the 47,500 datasets.
set.seed(11)      # for reproducibility
full_model <- vector("list", length = length(datasets))
BE_fits <- vector("list", length = length(datasets))
BE_fits   # returns a list with 15 elements, all of which are NULL

set.seed(11)      # for reproducibility
for(i in seq_along(datasets)) {
  full_model[[i]] <- lm(formula = Y ~ ., data = datasets[[i]])
  BE_fits[[i]] <- step(object = full_model[[i]], scope = formula(full_model[[i]]),
                       direction = 'backward', trace = 0) }

Therefore, I figured all I would need to do after already having all of this working to do a FS Stepwise would be the following:

### Step 4/5 (optional): Run a Forward Selection Stepwise Regression
### function on each of the 47,500 datasets.
### Assign the null models to their corresponding datasets and
### store these in the object "null_models"
set.seed(11)      # for reproducibility
#datasets[[1]]$
# try this line below if the line of code after it does not run/work
#null_model = lm(datasets[[1]]$Y ~ 1, data = datasets)
null_models <- vector("list", length = length(datasets))
FS_fits <- vector("list", length = length(datasets))
FS_fits   # returns a list with 15 elements, all of which are NULL

set.seed(11)      # for reproducibility
for(j in seq_along(datasets)) {
  null_models[[j]] <- lm(formula = Y ~ 1, data = datasets[[j]])
  FS_fits[[j]] <- step(object = null_models[[j]], 
                       scope = formula(null_models[[j]]),
                       direction = 'forward',
                       trace = 0) }

However, when I ran that and checked my work, I got the following:

> head(null_models, n = 1)
[[1]]
Call:
lm(formula = Y ~ 1, data = datasets[[j]])
Coefficients:
(Intercept)  
      1.017 
> head(FS_fits, n = 1)
[[1]]
Call:
lm(formula = Y ~ 1, data = datasets[[j]])
Coefficients:
(Intercept)  
      1.017  

> names(coef(FS_fits[[1]]))
[1] "(Intercept)"
> names(coef(FS_fits[[2]]))
[1] "(Intercept)"
> names(coef(FS_fits[[3]]))
[1] "(Intercept)"
Marlen
  • 171
  • 11
  • 1
    you have defined the `scope` of your search as just the intercept/ null model. – user20650 Aug 31 '22 at 00:00
  • 1
    see https://stackoverflow.com/questions/22913774/forward-stepwise-regression – user20650 Aug 31 '22 at 00:08
  • I tried to solution given in that answer given to the for loop in my script used to run the FS Regression, but every time I hit run, this is all I get: Error: unexpected '}' in: " direction = 'forward', scope = formula(full_model[[j]]), trace = 0) }" – Marlen Aug 31 '22 at 23:07
  • 2
    that indicates that you have an extra, or missing bracket/parenthesis somewhere. You will need to show the offending line of code for more help. – user20650 Aug 31 '22 at 23:39

0 Answers0