2

I am trying to run logistic regression by using LASSO in glmnet package. And I need to force the model to include certain parameters. However, I got an error.

> cv.lasso = cv.glmnet(x,y,family="binomial",alpha = 1,penalty.factor = penalty)
Error: Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
In addition: Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned 
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue

x has 95 variables that are all binary (0 or 1). I have to force 3 variables to be included so I set their penalty.factor = 0.

   > penalty
   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   [75] 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1

if I remove penalty.factor, it will work but I have to force those three variables to be included. However, when I keep penalty.factor and remove family = "binomial", it was running but it is not a binary logistic regression anymore. Does anyone know how to fix it?

Edit: Since I don't have a solution and I am facing pressure to show results ASAP, I choose to use the variables selected by LASSO combined with those three mandatory variables to run a regular logit regression. Somehow I think there will be an issue by doing this...

Thank you!

Z. Zhang
  • 501
  • 8
  • 20
  • This is a convergence issue. Since all of your features are binary and this is a logistic model, this is plausibly due to the Hauck-Dauber effect. – Sycorax Feb 02 '16 at 19:13
  • @user777 Thanks for your answer. Sorry I am not familiar with convergence issue. do you have any suggestions to fix it? – Z. Zhang Feb 02 '16 at 19:18
  • Check out the Hauck Dauber effect tag on CV.SE. – Sycorax Feb 02 '16 at 19:31
  • @user777 I don't think it is related to Hauck Dauber effect, since none of those three variables (the ones I want to keep) can perfectly separate the outcome. – Z. Zhang Feb 02 '16 at 22:11
  • What about in combination? – Sycorax Feb 02 '16 at 23:29
  • @user777 do you also suggest me to combine them? I feel like it will be problematic. So far I see many insignificant predictors (by using regular glm, I can see z values and p values) – Z. Zhang Feb 02 '16 at 23:35
  • @user777 sorry I guess I misunderstood. You meant combining those three variables? – Z. Zhang Feb 02 '16 at 23:37
  • My question is "Is there a combination of those three that results in perfect separation?" – Sycorax Feb 02 '16 at 23:38
  • @user777 Thanks. No, I don't think so (to prove it, I tried to use only three of them to run a logit regression). On the other hand, if they can separate perfectly, I don't even need LASSO. – Z. Zhang Feb 03 '16 at 00:10
  • In that case, try playing with the sequence of lambda values. – Sycorax Feb 03 '16 at 00:28
  • @Sycorax: it's called **[Hauck-Donner effect](http://stats.stackexchange.com/tags/hauck-donner-effect/info)** – smci Feb 02 '17 at 04:00
  • @smci Thanks for the correction. I was typing on my phone and it got mangled by auto-correct. – Sycorax Feb 02 '17 at 04:17

0 Answers0