12

Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the value of y for our line when our x equals 0, so I'm not clear what purpose always just injecting a '1' here serves. What is this constant actually telling the OLS fit?

Tim Lindsey
  • 727
  • 1
  • 7
  • 18

2 Answers2

15

It doesn't add a constant to your values, it adds a constant term to the linear equation it is fitting. In the single-predictor case, it's the difference between fitting an a line y = mx to your data vs fitting y = mx + b.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • so all the constant is doing is indicating there *is* a "b" in the equation? – Tim Lindsey Dec 31 '16 at 02:46
  • 4
    @TimLindsey: In essence, yes. It tells the model to fit a value for `b` as well as coefficients for your predictors. I've never really understood why statsmodels requires you to add this explicitly, since as described [here](http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-a-linear-regression-model) you pretty much always want to do it unless you have a specific justification for not doing so. – BrenBarn Dec 31 '16 at 02:52
9

sm.add_constant in statsmodel is the same as sklearn's fit_intercept parameter in LinearRegression(). If you don't do sm.add_constant or when LinearRegression(fit_intercept=False), then both statsmodels and sklearn algorithms assume that b=0 in y = mx + b, and it'll fit the model using b=0 instead of calculating what b is supposed to be based on your data.

wi3o
  • 1,467
  • 3
  • 17
  • 29