I had 628 predictors after forming dummy of all categorical variables. When I ran lot many iterations traditional logistic regression iteration, I came across 15 variables that was giving me pretty good model with good ROC, recall & precision(for certain cut-off) values on test data and also all variables were significant(at p<=0.05). But since it took lot of time, I tried using lasso that gave me 50 non-zero-coefficient variables after taking best lambda value post running 10 fold cross-validation. But only 5 variables were common between 15 variables of traditional method and 50 of lasso. Moreover, when I tried to calculate its SE and t-stats, I figured out that many variables are insignificant(low t-stats and high p-value). In addition to it, the AUC for ROC was less than traditional method.The ROC drops even more when I used traditional logistic regression on 50 variables that were result of lasso. Can someone help me understand the dynamics of it and how I will be able to justify the coefficients of lasso model as they are penalized(I have normalized all the variables before using lasso)?
Asked
Active
Viewed 611 times
0
-
2You might want to ask this question on Cross Validated rather than Stack Overflow. – jwimberley Apr 26 '17 at 11:50
-
Please help me understand that how to interpret the coefficients of variables that I got from Lasso(the data was normalized before using Lasso). Moreover, majority of these varisbles are not coming significant when I calculate the SE, t-stat on Lasso coefficients also it is not even coming significant when I check the p-value of the traditional model that used only the variables that were found to be non-zero by lasso. How I will be able to tell the significance of the variable and its interpretability to the business? – Sanjay Apr 27 '17 at 05:32