1

y ~ x +x:x seems to equal to just y ~ x, while y ~ x + I(x^2) correctly include the quadratic term in the model.

Why I cannot write quadratic terms as an interaction of a variable with itself?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
robertspierre
  • 3,218
  • 2
  • 31
  • 46

2 Answers2

1

Great question. A start at an answer is that while :, in practice, typically multiplies the numeric columns associated with the variable (e.g. if x and y are both numeric, x:y creates an interaction column that is the product of x and y), the root meaning of : in R's formula syntax is not "multiply columns" but "form an interaction". The interaction of a variable with itself is just itself.

I would love to have a complete formal description of R's version of Wilkinson-Rogers syntax (which is what this is), but I don't know that one exists. The original framing of the formula language is in Wilkinson and Rogers (1973) [where . rather than : was used for the "interaction" operator]; I believe there's a description in the "White Book" (Chambers and Hastie 1992); but other than that, I think the only full definition is the source code of model.matrix() itself (which is not all that nice to look at ...)


Chambers, J. M., and T. Hastie, eds. Statistical Models in S. Wadsworth & Brooks/Cole, 1992.

Wilkinson, G. N., and C. E. Rogers. “Symbolic Description of Factorial Models for Analysis of Variance.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 22, no. 3 (1973): 392–99. https://doi.org/10.2307/2346786.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
0

To easily formulate polynomials, we can use poly.

lm(Petal.Length ~ Sepal.Length + I(Sepal.Length^2) + I(Sepal.Length^3), iris)$coe
# (Intercept)      Sepal.Length I(Sepal.Length^2) I(Sepal.Length^3) 
#  19.8028068       -13.5808046         2.8767502        -0.1742277

lm(Petal.Length ~ poly(Sepal.Length, 3, raw=TRUE), iris)$coe
# (Intercept) poly(Sepal.Length, 3, raw = TRUE)1 
#  19.8028068                        -13.5808046 
# poly(Sepal.Length, 3, raw = TRUE)2 poly(Sepal.Length, 3, raw = TRUE)3 
#                          2.8767502                         -0.1742277
jay.sf
  • 60,139
  • 8
  • 53
  • 110