0

Suppose I want to fit an elastic net model. I have a matrix X of 1000 observations of 1000 variables each and a vector y of 1000 class labels. Now I would like to compare different feature selection approaches that are applied before the elastic net is used to predict y (family = biniomial).

In both approaches I select 500 features by some method (e.g. at random) resulting in two different feature sets that may overlap with 1000 observations each. Next I fit two models using the glmnet package in R each using one of the two sets.

Can I compare both models using Akaike Information Criterion although they don't "share" a saturated model?

From this post (Is there a way in R to determine AIC from cv.glmnet?) I know that the log-likelihood can be obtained from the glmnet model via:

2*log-likelihood = -(1 - fit$dev.ratio) * nulldev
  • can you explain why you would want to do feature selection *before* elastic net? – Ben Bolker Sep 08 '21 at 14:42
  • I've really got a lot more than 1000 features. This was just for the sake of the example. Due to memory limitations I need to boil it down beforehand. – TauSigma Sep 08 '21 at 15:58
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Sep 08 '21 at 15:58

1 Answers1

0

The answer seems to be: Yes, I can compare models with different predictors using AIC.

https://stats.stackexchange.com/questions/228436/what-breaks-the-comparibility-of-models-with-respect-to-the-aic