statsmodels add_constant for OLS intercept, what is this actually doing?

Question

Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the value of y for our line when our x equals 0, so I'm not clear what purpose always just injecting a '1' here serves. What is this constant actually telling the OLS fit?

score 15 · Accepted Answer · answered Dec 31 '16 at 02:10

15

It doesn't add a constant to your values, it adds a constant term to the linear equation it is fitting. In the single-predictor case, it's the difference between fitting an a line y = mx to your data vs fitting y = mx + b.

answered Dec 31 '16 at 02:10

BrenBarn

242,874
37
412
384

so all the constant is doing is indicating there *is* a "b" in the equation? – Tim Lindsey Dec 31 '16 at 02:46
4

@TimLindsey: In essence, yes. It tells the model to fit a value for `b` as well as coefficients for your predictors. I've never really understood why statsmodels requires you to add this explicitly, since as described [here](http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-a-linear-regression-model) you pretty much always want to do it unless you have a specific justification for not doing so. – BrenBarn Dec 31 '16 at 02:52

score 9 · Answer 2 · answered Apr 13 '17 at 16:24

sm.add_constant in statsmodel is the same as sklearn's fit_intercept parameter in LinearRegression(). If you don't do sm.add_constant or when LinearRegression(fit_intercept=False), then both statsmodels and sklearn algorithms assume that b=0 in y = mx + b, and it'll fit the model using b=0 instead of calculating what b is supposed to be based on your data.

statsmodels add_constant for OLS intercept, what is this actually doing?

2 Answers2