3

I want to get the confidence intervals for LASSO regression. For this, I used the selective inference package in R.

The fixedLassoInf function in this package provides the confidence intervals for lasso regression for a given value of lambda. Also, we can pass the coefficient vector obtained from glmnet package to this function.

The coefficients for LASSO logistic regression for a given lambda using glmnet package is as follows:

    require(ISLR)

    require(glmnet)
    require(selectiveInference)

  y1 <- Default$default
x1 <- model.matrix(default ~ student + balance + income + student*income, Default)[, -1]

lasso.mod1 <- glmnet(x1,y1, alpha = 1, lambda = 0.0003274549,family='binomial')

lasso.mod$beta

> lasso.mod1$beta
4 x 1 sparse Matrix of class "dgCMatrix"
                             s0
studentYes        -6.131640e-01
balance            5.635401e-03
income             2.429232e-06
studentYes:income  .         

Then I used the fixedLassoInf function in selective inference package in R, to get the confidence intervals:

y1 <- Default$default

beta = coef(lasso.mod1, x=x1, y=y1, s=lambda/1000, exact=T)
y1= ifelse(y1=="NO",0,1)

out = fixedLassoInf(x1,(y1),beta,lambda,family="binomial",alpha=0.05)
out

However, I am getting following Warning messages:

**

Warning messages:
1: In fixedLogitLassoInf(x, y, beta, lambda, alpha = alpha, type = "partial",  :
  Solution beta does not satisfy the KKT conditions (to within specified tolerances)
2: In fixedLogitLassoInf(x, y, beta, lambda, alpha = alpha, type = "partial",  :
  Solution beta does not satisfy the KKT conditions (to within specified tolerances). You might try rerunning glmnet with a lower setting of the 'thresh' parameter, for a more accurate convergence.
3: glm.fit: algorithm did not converge

**

Also as the output I am getting something not correct,

Call:
fixedLassoInf(x = x1, y = (y1), beta = beta, lambda = lambda, 
    family = "binomial", alpha = 0.05)

Testing results at lambda = 0.000, with alpha = 0.050

 Var     Coef   Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
   1 1142.801  1884.776       1      -Inf  -60.633           0          0
   2    0.386  1664.734       0     0.023      Inf           0          0
   3    0.029  3318.110       0     0.001      Inf           0          0
   4   -0.029 -1029.985       1      -Inf   -0.003           0          0

Note: coefficients shown are partial regression coefficients 

Based on the warning message, there is a problem with the Karush Kuhn Tucker (KKT) condition.

Can anyone help me to figure this out?

Thank you.

Vitali Avagyan
  • 1,193
  • 1
  • 7
  • 17
student_R123
  • 962
  • 11
  • 30
  • Better yet, why are you trying to get CI from a LASSO model? – user2974951 Sep 10 '19 at 11:35
  • 1
    The glmnet package does not provide standard errors. So CI cannot calculate – student_R123 Sep 10 '19 at 12:41
  • 1
    It doesn't provide them for a reason, in that it does not really make sense. What are you trying to achieve with CI? – user2974951 Sep 10 '19 at 12:42
  • Just for the inference purposes. – student_R123 Sep 10 '19 at 13:13
  • 2
    LASSo is not made for inference, it's main purpose is prediction. You can google the subject and you will find many references for why this isn't done and why it's a bad idea. If you want inference you should choose a different model. – user2974951 Sep 10 '19 at 13:55
  • 1
    This is a worthy question: There is a whole literature on inference after model selection, and there are closed-form solutions for adjusting standard errors by conditioning on the event that variables were selected via the LASSO. `selectiveInference` is a package that does just that. I have a similar question here: https://stackoverflow.com/questions/68147109/r-how-to-translate-lasso-lambda-values-from-the-cv-glmnet-function-into-the-s – Mark White Jun 27 '21 at 00:17

1 Answers1

-1

One of my university teachers always said

Fitting is an art, not a technique.

What I mean: Do expect that you need manual work for parameter guessing and multiple iterations of fitting. You might even question the method of fitting itself, but let's not go that path.

Anyhow, R will not do the magic of finding the correct model (now: number of parameters for LASSO) for you. From the output you show, you seem to have 4 variables, of which 3 are close to zero, therefore I suggest to start with...

  1. Bounding the maximal number of variables in the model, i.e. dfmax=2 seems a good start
  2. Limiting the maximum number of variables ever to be nonzero, e.g. pmax=2

The documentation of glment further details on other options.

B--rian
  • 5,578
  • 10
  • 38
  • 89
  • @student_R123 For more on KKT, I found https://math.stackexchange.com/questions/2162932/big-picture-behind-how-to-use-kkt-conditions-for-constrained-optimization pretty interesting. Please let me know whether you have questions to my suggestion. – B--rian Sep 18 '19 at 13:47