Why am I getting wildly different results between statsmodels and sklearn's logistic regression

Asked Oct 30 '19 at 13:57

Active Oct 30 '19 at 13:57

Viewed 260 times

I know this might seem as a duplicate question, but I have done what was suggested here, but it didn't work. I'm working with 28 variables, some of them categorical, but I dropped the one of each variable's categories. As stated in the old question I defined:

import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression

logit = sm.Logit(y_train, X_train)

clf = LogisticRegression(C=1e8,fit_intercept=False)
clf.fit(X_train, y_train)

And then, when I check my results, in sm my log-likelihood is -19661. In sklearn my log_loss is 1.9. In sm my coefficients range form -1.38 to 1.98. In sklearn they range from -0.01 to 0.02. I need my results to be more alike, so I can perform some inference on my results. What could be happening? What am I missing?

Thanks.

asked Oct 30 '19 at 13:57

Juan C

5,846
2
17
51

It would help if you provide a minimal working example: include some example data and the two methods you use to provide the logistic regression. – FChm Oct 30 '19 at 15:19
I wanted to do that, but not easy to reproduce, because I know that with "normal" data this should work and I can't add 100+ rows of my actual data here either – Juan C Oct 30 '19 at 15:32
Are you fitting an intercept when you use statsmodels? – FChm Oct 30 '19 at 15:33
Tried both ways, nothing changed – Juan C Oct 30 '19 at 15:34
Are you classes perfectly separable? Perhaps neither solution has converged? – FChm Oct 30 '19 at 15:40
Didn't get an "unable to converge" error, but also I'm not sure how to check if I got results without convergence. Do you? Thanks ! – Juan C Oct 30 '19 at 15:43

Why am I getting wildly different results between statsmodels and sklearn's logistic regression

0 Answers0