statsmodels logistic regression odds ratio

Question

I'm wondering how can I get odds ratio from a fitted logistic regression models in python statsmodels.

>>> import statsmodels.api as sm
>>> import numpy as np
>>> X = np.random.normal(0, 1, (100, 3))
>>> y = np.random.choice([0, 1], 100)
>>> res = sm.Logit(y, X).fit()
Optimization terminated successfully.
         Current function value: 0.683158
         Iterations 4
>>> res.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                          Logit   Df Residuals:                       97
Method:                           MLE   Df Model:                            2
Date:                Sun, 05 Jun 2016   Pseudo R-squ.:                0.009835
Time:                        23:25:06   Log-Likelihood:                -68.316
converged:                       True   LL-Null:                       -68.994
                                        LLR p-value:                    0.5073
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0033      0.181     -0.018      0.985        -0.359     0.352
x2             0.0565      0.213      0.265      0.791        -0.362     0.475
x3             0.2985      0.216      1.380      0.168        -0.125     0.723
==============================================================================
"""
>>>

Some info here: http://blog.yhat.com/posts/logistic-regression-and-python.html — BrenBarn, Jun 05 '16 at 22:38
According to the site `OR=np.exp(res.params)` . I'm not 100% sure that that formula is right — Donbeo, Jun 05 '16 at 22:53
Is your question about the math of how to get the odds ratio, or the programming of how to get it from statsmodels. See for instance the very end of [this page](http://www.ats.ucla.edu/stat/stata/faq/oratio.htm), which says "The end result of all the mathematical manipulations is that the odds ratio can be computed by raising e to the power of the logistic coefficient". — BrenBarn, Jun 05 '16 at 23:00
The point is that I'm not sure that this is true in multivariate regression. i.e. If more than one input variable is used. — Donbeo, Jun 05 '16 at 23:37
If your question is about the stats involved, you're probably better off asking on [Cross Validation](http://stats.stackexchange.com/). — BrenBarn, Jun 05 '16 at 23:41
I did some time ago http://stats.stackexchange.com/questions/208136/odds-ratio-vs-confidence-interval-in-logistic-regression. This is why I think the formula is wrong. — Donbeo, Jun 06 '16 at 00:07
@Donbeo I'm not sure what that answer means. oddsratios are exp(params) in Logit, and you can get the confidence interval for the oddsratios by endpoint transformation by just using exp(confint()) where confint is for the estimated parameters. — Josef, Jun 06 '16 at 00:12
see for example Stata's `eform` http://www.stata.com/manuals14/rglm.pdf which has the interpretation for Logit, Poisson, and similar applies to a few more other models that are based on an exp transformation, eg. hazard ratio, IIRC. — Josef, Jun 06 '16 at 00:16
can you confirm `OR=exp(coef)` in multivariate logistic regression? — Donbeo, Jun 06 '16 at 00:19
Yes, that's what I'm saying, confirmed (because exp makes it multiplicative so other terms cancel in the ratio). However, oddsratio is usually used for binary 0-1 regressors, otherwise you would have to look at the interpretation of the effect of a unit change or of the slope effect of a continuous variable. — Josef, Jun 06 '16 at 00:27

score 20 · Accepted Answer · edited Feb 18 '20 at 17:44

20

You can get the odds ratio with:

np.exp(res.params)

To also get the confidence intervals (source):

params = res.params
conf = res.conf_int()
conf['Odds Ratio'] = params
conf.columns = ['5%', '95%', 'Odds Ratio']
print(np.exp(conf))

Disclaimer: I've just put together the comments to your question.

edited Feb 18 '20 at 17:44

mc51

1,883
14
28

answered Dec 10 '17 at 16:22

lincolnfrias

1,983
4
19
29

I think you forgot to use np.exp(res.params) when assigning params as odds ratios in your code block. – JCM Jun 28 '22 at 15:35
Hi applied the exponent in the print – oriel9p Jul 31 '22 at 17:57

score 2 · Answer 2 · answered Jan 08 '21 at 14:07

Not sure about statsmodels, to do it in sklearn:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)

df=pd.DataFrame({'odds_ratio':(np.exp(logisticRegr.coef_).T).tolist(),'variable':x.columns.tolist()})
df['odds_ratio'] = df['odds_ratio'].str.get(0)

df=df.sort_values('odds_ratio', ascending=False)
df

watch out! sklearn uses a regularized regression by default which biases the coef_ numbers. best to use statsmodels if your primary interest is the model coefficients as opposed to the model predictions. — benten, Jan 30 '23 at 20:07

score 0 · Answer 3 · answered Oct 13 '21 at 14:54

As an option basically equivalent to lincolnfrias' one, but maybe more handy (and directly usable in stargazer tables), consider the following:

from stargazer.utils import LogitOdds

odds = LogitOdds(original_logit_model)

see this stargazer issue for more background.

statsmodels logistic regression odds ratio

3 Answers3