Different Logistic Regression results with R and Python

Question

I'm getting completely different results when I run a logistic regression in R from when I run it in Python on the same data. The intercepts and coefficients are very different from each other

I've seen this same problem posted here, however the solution there is that for the given data set, the X and Y variables exhibit perfect separation, however in my own data there is no perfect separation.

Here's a reproducible example in R:

x_examp <- c(1,4,7,9,13,17,22,25,29,30,35,40,44,47,50)
y_examp <- c(1,1,1,1,0,1,0,0,1,0,0,0,0,0,0)
mod = glm(y_examp ~ x_examp, family = 'binomial')
summary(mod)

which gives these coefficients (the estimates):

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  3.15324    1.75197   1.800   0.0719 .
x_examp     -0.16534    0.07996  -2.068   0.0387 *

And here's an logistic regression in Python using the same data:

x_examp = np.array([1,4,7,9,13,17,22,25,29,30,35,40,44,47,50])
x_examp = x_examp.reshape(-1, 1)
y_examp = np.array([1,1,1,1,0,1,0,0,1,0,0,0,0,0,0])

from sklearn.linear_model import LogisticRegression

LR = LogisticRegression()
LR.fit(x_examp, y_examp)

print('intercept:', LR.intercept_)
print('coefficient:', LR.coef_[0])

which returns:

intercept: [ 1.11232593]
coefficient: [-0.08579351]

Given that the standard error is calculated using the predicted values, which in turn depend on the coefficients, the standard error will differ from that calculated in R, and z-statistics and corresponding probability will also differ.

Clearly the results are very different, does anyone know why this is the case, and which one is correct?

Not only does sklearn assume all regressions are regularized (see lasso) but last I checked, it didn’t even standardize variables correctly. It sure makes a lot of assumptions and choices on behalf of its users, which is unfortunate because many lack the stats or ml experience to know what those are. It would seem to violate John Chambers _prime directive_ in Software for Data Analysis... — Justin, Oct 10 '18 at 17:48

Different Logistic Regression results with R and Python

0 Answers0