I need to run the enet() function from the elasticnet library in RStudio on each of these 47,000 datasets individually because they have been created in such a way that we know what the real underlying population for each dataset is and want to see how often the new algorithm finds that vs LASSO and Stepwise and the runtime of each.
I have absolutely no idea how to do this or even what search terms to use to look it up, I have already tried in both Google and Bing several times. I believe that the only packages my code as it stands requires are:
- leaps
- lars
- stats
- plyr
- dplyr
- readr
- elasticnet
This is my code to run the LASSO (obviously, I made up the dataframe names for the x & y arguments in the enet() function for this post/question lol):
## Attempt 2: Run a LASSO regression using
## the enet function from the elasticnet library
set.seed(11)
library(elasticnet)
enet_LASSO <- enet(x = as.matrix(df_all_obs_on_all_of_the_IVs),
y = df_all_obs_on_the_DV,
lambda = 0, normalize = FALSE)
print(enet_LASSO)
# In order to ascertain which predictors/regressors are still
# included in the version of the model after running a
# LASSO regression on it for the purpose of variable selection,
# I am going to use the 'predict' method from the stats package.
LASSO_coeffs <- predict(enet_LASSO,
x = as.matrix(df_all_obs_on_all_of_the_IVs),
s = 0.1, mode = "fraction", type = "coefficients")
print(LASSO_coeffs)
Optional background context & motivation: I am in the middle of a research project and in order to compare a new statistical learning procedure for choosing the optimal regression specification, I am running this new algorithm as a Monte Carlo Experiment in which I run it and two benchmarks (LASSO & Stepwise) on a synthetic dataset my collaborator created for me which consists of a multiple GB file folder filled with 47,000 individual csv files.