1

Pasty which is nicely integrated in Statsmodels allows to write R-style formulas based on string.

import statsmodels.formula.api as smf

res = smf.OLS.from_formula("Wealth ~ Age + Income + Happy", data=df).fit()
Print res.summary()

This will display the summary of my regression, but the order of the parameters doesn't seems to follow any rule, e.g :

                           OLS Regression Results                            
==============================================================================
Dep. Variable:                 Wealth   R-squared:                       0.309
Model:                            OLS   Adj. R-squared:                  0.283
Method:                 Least Squares   F-statistic:                     12.06
Date:                Tue, 28 Feb 2017   Prob (F-statistic):           1.32e-06
Time:                        21:36:08   Log-Likelihood:                -377.13
No. Observations:                  85   AIC:                             762.3
Df Residuals:                      81   BIC:                             772.0
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          38.6348     15.825      2.441      0.017       7.149      70.121
Income             -0.3522      0.334     -1.056      0.294      -1.016       0.312
Age                 0.4364      0.283      1.544      0.126      -0.126       0.999
Happy              -0.0005      0.006     -0.085      0.933      -0.013       0.012
==============================================================================
Omnibus:                        4.447   Durbin-Watson:                   1.953
Prob(Omnibus):                  0.108   Jarque-Bera (JB):                3.228
Skew:                          -0.332   Prob(JB):                        0.199
Kurtosis:                       2.314   Cond. No.                     1.40e+04
==============================================================================

As a result it is quite cumbersome to search for the parameter I am looking for.

Is there any way to force the summary output to be displayed with the same order as the imputed string ?

Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33
  • I haven't checked in detail, but patsy works hard to preserve the order information, so I suspect something is going wrong inside statsmodels after patsy finishes processing the formula. I'd suggest filing a bug on statsmodels. – Nathaniel J. Smith Apr 26 '18 at 21:28
  • AFAIK, statsmodels never reorders columns of the provided design matrix. – Josef Apr 26 '18 at 22:35
  • 1
    Well if you look at the page http://www.statsmodels.org/dev/example_formulas.html that explains how to use Patsy formulas with Statsmodels, the canonical exemple is : 'Lottery ~ Literacy + Wealth + Region' and the result is returned in the order : Regions (4 categories), Literacy, Wealth. But I was more looking for a practical solution to solve my problem. Currently the best way to proceed is to do not use Patsy formulas (which is far less practical when there is the need to interact variables). – Adrien Pacifico Apr 27 '18 at 17:20

0 Answers0