With linear models the interaction term is defined by :
and terms are separated by a +
, so a model with the single and interaction terms is
lm(y ~ x1:x2 + x1 + x2)
However, you can write x1*x2
which includes by the interaction and single effects so the following is equivalent to the above
lm(y ~ x1*x2)
See what happens when using the built in dataset iris, where the fixed effects are specified as Petal.Width*Sepal.Length
, all three terms are in the model summary:
Call:
lm(formula = Petal.Length ~ Petal.Width * Sepal.Length, data = iris)
Residuals:
Min 1Q Median 3Q Max
-0.99588 -0.24329 0.00355 0.29735 1.24780
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.24804 0.59586 -5.451 2.08e-07 ***
Petal.Width 2.97115 0.35836 8.291 6.74e-14 ***
Sepal.Length 0.87551 0.11667 7.504 5.60e-12 ***
Petal.Width:Sepal.Length -0.22248 0.06384 -3.485 0.00065 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3888 on 146 degrees of freedom
Multiple R-squared: 0.9525, Adjusted R-squared: 0.9515
F-statistic: 975.4 on 3 and 146 DF, p-value: < 2.2e-16
As to what the comma is doing in your models, it is creating a subset. Compare the summary of the following three models: the first have 146 and 147 degrees of freedom - they have have 150 data points and estimate 4 and 3 parameters each. The third model, one that mimics your specification, has 129 degrees of freedom - that's what made me realise it was subsetting. Checking the documentation for lm()
, there is an argument for subsetting: lm(formula, data, subset, ...)
. Because data
is specified explicitly, the unspecified arguments default to formula
and subset
. You can also see that in the model summary, which shows a subset in the model call.
summary(lm(Petal.Length ~ Petal.Width * Sepal.Length, data = iris))
summary(lm(Petal.Length ~ Petal.Width + Sepal.Length, data = iris))
summary(lm(Petal.Length ~ Petal.Width + Sepal.Length, Petal.Width * Sepal.Length, data = iris))
Your result can be recreated by passing this vector, iris$Petal.Width * iris$Sepal.Length
, as row numbers - so be careful, that's resuing some rows a lot and skipping a lot too so the result of this model doesn't match one that use all the data (and each data point only once).
summary(lm(Petal.Length ~ Petal.Width + Sepal.Length, data = iris[iris$Petal.Width * iris$Sepal.Length, ]))