11

I'm wondering how can I get odds ratio from a fitted logistic regression models in python statsmodels.

>>> import statsmodels.api as sm
>>> import numpy as np
>>> X = np.random.normal(0, 1, (100, 3))
>>> y = np.random.choice([0, 1], 100)
>>> res = sm.Logit(y, X).fit()
Optimization terminated successfully.
         Current function value: 0.683158
         Iterations 4
>>> res.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                          Logit   Df Residuals:                       97
Method:                           MLE   Df Model:                            2
Date:                Sun, 05 Jun 2016   Pseudo R-squ.:                0.009835
Time:                        23:25:06   Log-Likelihood:                -68.316
converged:                       True   LL-Null:                       -68.994
                                        LLR p-value:                    0.5073
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0033      0.181     -0.018      0.985        -0.359     0.352
x2             0.0565      0.213      0.265      0.791        -0.362     0.475
x3             0.2985      0.216      1.380      0.168        -0.125     0.723
==============================================================================
"""
>>> 
Donbeo
  • 17,067
  • 37
  • 114
  • 188
  • Some info here: http://blog.yhat.com/posts/logistic-regression-and-python.html – BrenBarn Jun 05 '16 at 22:38
  • 2
    According to the site `OR=np.exp(res.params)` . I'm not 100% sure that that formula is right – Donbeo Jun 05 '16 at 22:53
  • Is your question about the math of how to get the odds ratio, or the programming of how to get it from statsmodels. See for instance the very end of [this page](http://www.ats.ucla.edu/stat/stata/faq/oratio.htm), which says "The end result of all the mathematical manipulations is that the odds ratio can be computed by raising e to the power of the logistic coefficient". – BrenBarn Jun 05 '16 at 23:00
  • The point is that I'm not sure that this is true in multivariate regression. i.e. If more than one input variable is used. – Donbeo Jun 05 '16 at 23:37
  • If your question is about the stats involved, you're probably better off asking on [Cross Validation](http://stats.stackexchange.com/). – BrenBarn Jun 05 '16 at 23:41
  • I did some time ago http://stats.stackexchange.com/questions/208136/odds-ratio-vs-confidence-interval-in-logistic-regression. This is why I think the formula is wrong. – Donbeo Jun 06 '16 at 00:07
  • @Donbeo I'm not sure what that answer means. oddsratios are exp(params) in Logit, and you can get the confidence interval for the oddsratios by endpoint transformation by just using exp(confint()) where confint is for the estimated parameters. – Josef Jun 06 '16 at 00:12
  • see for example Stata's `eform` http://www.stata.com/manuals14/rglm.pdf which has the interpretation for Logit, Poisson, and similar applies to a few more other models that are based on an exp transformation, eg. hazard ratio, IIRC. – Josef Jun 06 '16 at 00:16
  • can you confirm `OR=exp(coef)` in multivariate logistic regression? – Donbeo Jun 06 '16 at 00:19
  • Yes, that's what I'm saying, confirmed (because exp makes it multiplicative so other terms cancel in the ratio). However, oddsratio is usually used for binary 0-1 regressors, otherwise you would have to look at the interpretation of the effect of a unit change or of the slope effect of a continuous variable. – Josef Jun 06 '16 at 00:27

3 Answers3

20

You can get the odds ratio with:

np.exp(res.params)

To also get the confidence intervals (source):

params = res.params
conf = res.conf_int()
conf['Odds Ratio'] = params
conf.columns = ['5%', '95%', 'Odds Ratio']
print(np.exp(conf))

Disclaimer: I've just put together the comments to your question.

mc51
  • 1,883
  • 14
  • 28
lincolnfrias
  • 1,983
  • 4
  • 19
  • 29
2

Not sure about statsmodels, to do it in sklearn:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)

df=pd.DataFrame({'odds_ratio':(np.exp(logisticRegr.coef_).T).tolist(),'variable':x.columns.tolist()})
df['odds_ratio'] = df['odds_ratio'].str.get(0)

df=df.sort_values('odds_ratio', ascending=False)
df
  • watch out! sklearn uses a regularized regression by default which biases the coef_ numbers. best to use statsmodels if your primary interest is the model coefficients as opposed to the model predictions. – benten Jan 30 '23 at 20:07
0

As an option basically equivalent to lincolnfrias' one, but maybe more handy (and directly usable in stargazer tables), consider the following:

from stargazer.utils import LogitOdds

odds = LogitOdds(original_logit_model)

see this stargazer issue for more background.

Pietro Battiston
  • 7,930
  • 3
  • 42
  • 45