12

Hi I'm learning Statsmodel and can't figure out the difference between : and * (interaction terms) for formulas in StatsModels OLS regression. Could you please give me a hint to figure this out?

Thank you!

The documentation: http://statsmodels.sourceforge.net/devel/example_formulas.html

user3368526
  • 2,168
  • 10
  • 37
  • 52
  • 2
    The most complete explanation is in the patsy documentation http://patsy.readthedocs.org/en/latest/formulas.html which is used by statsmodels. This http://stackoverflow.com/questions/23672466/interaction-effects-in-patsy-with-patsy-dmatrices-giving-duplicate-columns-for also has some explanation for the difference between `:` and `*`. – Josef Oct 10 '15 at 12:22

2 Answers2

22

":" will give a regression without the level itself. just the interaction you have mentioned.

"*" will give a regression with the level itself + the interaction you have mentioned.

for example

a. GLMmodel = glm("y ~ a: b" , data = df)

you'll have only one independent variable which is the results of "a" multiply by "b"

b. GLMmodel = glm("y ~ a * b" , data = df)

you'll have 3 independent variables which is the results of "a" multiply by "b" + "a" itself + "b" itself

Bussller
  • 1,961
  • 6
  • 36
  • 50
Yaron
  • 1,726
  • 14
  • 18
5

Using A*B is really just shorthand for A + B + A:B

A:B specifies the interaction itself. This is literally the product of the two variables. As such, it rarely makes sense to fit a model with only this term, so we almost always fit the main effects, A and B too (see here for reasons why). Since this is so common, the shorthand notation A*B for this is quite common in many statistical software packages/platforms.

Robert Long
  • 5,722
  • 5
  • 29
  • 50