Coefficients and significance of lasso/ridge

Question

I had 628 predictors after forming dummy of all categorical variables. When I ran lot many iterations traditional logistic regression iteration, I came across 15 variables that was giving me pretty good model with good ROC, recall & precision(for certain cut-off) values on test data and also all variables were significant(at p<=0.05). But since it took lot of time, I tried using lasso that gave me 50 non-zero-coefficient variables after taking best lambda value post running 10 fold cross-validation. But only 5 variables were common between 15 variables of traditional method and 50 of lasso. Moreover, when I tried to calculate its SE and t-stats, I figured out that many variables are insignificant(low t-stats and high p-value). In addition to it, the AUC for ROC was less than traditional method.The ROC drops even more when I used traditional logistic regression on 50 variables that were result of lasso. Can someone help me understand the dynamics of it and how I will be able to justify the coefficients of lasso model as they are penalized(I have normalized all the variables before using lasso)?

You might want to ask this question on Cross Validated rather than Stack Overflow. — jwimberley, Apr 26 '17 at 11:50
Please help me understand that how to interpret the coefficients of variables that I got from Lasso(the data was normalized before using Lasso). Moreover, majority of these varisbles are not coming significant when I calculate the SE, t-stat on Lasso coefficients also it is not even coming significant when I check the p-value of the traditional model that used only the variables that were found to be non-zero by lasso. How I will be able to tell the significance of the variable and its interpretability to the business? — Sanjay, Apr 27 '17 at 05:32

Coefficients and significance of lasso/ridge

0 Answers0