0

I am working with R to build a predictive model using Binary Logistic with the Lasso penalty.

Originally that data set consisted of 63147 observations and 22 variables, with 3% of the observations coming from $G_1$ and 97% coming from $G_2$. As can be seen this is very unbalance so I have taken a sample of 30% coming from $G_1$ and 70% coming from $G_2$ with a sample size of 5000.

I have tried fitting 2 models, the classical binary logistic regression (BLR) using the glm package in R-software and the binary logistic regression with the Lasso penalty using glmnet package.

I have scaled my data using the scale function in R because the variables were measured using different measurements.

When fitting the BLR an error occurred as can be seen below:

> BLR.Model.SubPop <- train(y~., data = Train.Data.SubPop, method = "glm", family = "binomial")
There were 47 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
3: glm.fit: algorithm did not converge
4: glm.fit: fitted probabilities numerically 0 or 1 occurred
5: glm.fit: algorithm did not converge
6: glm.fit: fitted probabilities numerically 0 or 1 occurred
7: glm.fit: algorithm did not converge)

From the research I have done this is due to separation in our data.

I had then opted to used BLR with the LASSO where I have used the cv.glmnet() function to find lamnda.min and lambda.1se

Below are the coefficients for the above mentioned values of lambda

> cv.lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial", type.measure = "class")
> plot(cv.lasso)
> cv.lasso$lambda.min
[1] 5.575006e-05
> cv.lasso$lambda.1se
[1] 0.0001173485
> coef(cv.lasso,cv.lasso$lambda.min)[,1]
                (Intercept) X1     X2             X3       
-94.7714913 0   0.17288133  -0.28371818 0.03050103
     X4         X5  X6      X7         X8   X9
0.02482283  0   0   -0.26218308 0
X10 X11         X12 X13 X14
0   -2.00853016 0   0   0
X15          X16           X17  X18 X19
-0.01768456 0.56538543  0   0   0.54166489
X20             X21         
30.10519005 0.18198277

(Intercept) X1  X2            X3            X4
-71.93503132    0   0.0644656   -0.287559336    0.001068958
X5         X6   X7  X8  X9
0.017905135 0   0   0   0
X10 X11        X12  X13 X14
0   -1.239442745    0   0   0
X15        X16  X17 X18 X19
-0.083885831    0   0   0   0.19158206
X20         X21         
22.99517148 0.052543166         

When I tried to fit the Lasso using lambda.min and lambda.1se I encountered the below warning.

> lasso.model.1se <- glmnet(x, y, alpha = 1, family = "binomial", lambda = cv.lasso$lambda.1se)
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned 
2: In getcoef(fit, nvars, nx, vnames) :
  an empty model has been returned; probably a convergence issue

However, when I run the below code it run

LASSO.prob <- cv.lasso %>% predict(newx=x.test,type = "response")

I am not sure which lambda cv.lasso is working with, is it lambda.min or lamnda.1se? Why am I getting the errors?

I used the below link as a reference to build my code:

http://www.sthda.com/english/articles/36-classification-methods-essentials/149-penalized-logistic-regression-essentials-in-r-ridge-lasso-and-elastic-net/?fbclid=IwAR0ZTjoGqRgH5vNum9CloeGVaHdwlqDHwDdoGKJXwncOgIT98qUXGcnV70k

Lise
  • 51
  • 1
  • 6
  • By default glmnet uses lambda.1se, was that your question? – user2974951 Sep 05 '19 at 09:02
  • @user2974951 - My question why am i getting the warnings? – Lise Sep 05 '19 at 09:38
  • There seems to be a convergence issue when you try to build a model after CV. First step would be to increase maxit, if that does not work change optimizer. – user2974951 Sep 05 '19 at 09:41
  • @user2974951 i tried to change the maxit but same warning. What i have done is change the lambda value and it seems it is working now, do you have any insights why this happens? – Lise Sep 05 '19 at 10:27
  • Hard to tell, maybe try checking where the model fails, for which values of lambda. Does it fail only for low or high values of lambda in the sequence of lambdas? If so, the model could not be estimated for those value. Or does it happen for random lambdas? In which case this would mean trouble. – user2974951 Sep 05 '19 at 10:35
  • I have done it for `lambda.min` which did not work but then I tried for `lambda=0.0001` and it worked. As can be seen from above `lambda.min = 5.575 x 10-5`. i had chosen a random lambda – Lise Sep 05 '19 at 10:42

0 Answers0