I'm getting completely different results when I run a logistic regression in R from when I run it in Python on the same data. The intercepts and coefficients are very different from each other
I've seen this same problem posted here, however the solution there is that for the given data set, the X and Y variables exhibit perfect separation, however in my own data there is no perfect separation.
Here's a reproducible example in R
:
x_examp <- c(1,4,7,9,13,17,22,25,29,30,35,40,44,47,50)
y_examp <- c(1,1,1,1,0,1,0,0,1,0,0,0,0,0,0)
mod = glm(y_examp ~ x_examp, family = 'binomial')
summary(mod)
which gives these coefficients (the estimates):
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.15324 1.75197 1.800 0.0719 .
x_examp -0.16534 0.07996 -2.068 0.0387 *
And here's an logistic regression in Python
using the same data:
x_examp = np.array([1,4,7,9,13,17,22,25,29,30,35,40,44,47,50])
x_examp = x_examp.reshape(-1, 1)
y_examp = np.array([1,1,1,1,0,1,0,0,1,0,0,0,0,0,0])
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(x_examp, y_examp)
print('intercept:', LR.intercept_)
print('coefficient:', LR.coef_[0])
which returns:
intercept: [ 1.11232593]
coefficient: [-0.08579351]
Given that the standard error is calculated using the predicted values, which in turn depend on the coefficients, the standard error will differ from that calculated in R
, and z-statistics and corresponding probability will also differ.
Clearly the results are very different, does anyone know why this is the case, and which one is correct?