3

I'd like a way to compactly use R's formula notation -- or some other formalism --to include all the quadratic terms between a set of variables A through E, excluding the D:E interaction. (My real problem has a longer list of A - C type variables and D - E type variables)

I wrote a little function to check my work based on this post (Thanks, @Gregor!).

expand_form <- function(FUN){
  out <- reformulate(labels(terms(FUN)), FUN[[2]])
  out
}

I thought this would do it:

f <- y ~ (A + B + C + D + E)^2 -D:E
>expand_form(f)
y ~ A + B + C + D + E + A:B + A:C + A:D + A:E + B:C + B:D + B:E + 
    C:D + C:E
<environment: 0x00000218fc153928>

but it does not include the single-variable squared terms. Of course I could just explicitly add those terms as A:A, B:B, etc. -- or no, actually. I just tried that and it has no effect on the output of expand_form(). And neither does adding A^2, B^2, etc. terms. Not sure if this is a problem with my formula or with my expand_form() function.

I looked at 5 or six posts on related topics, but none seemed to provide a compact solution in formula notation, which I am assuming exists.

In response to @Maurits Evers' very clear and helpful comments/answer below, I want to clarify my question to more clearly recognize:

  1. that the thing I want to do is what most people will want to do in certain contexts; and
  2. that I now recognize that R's standard formula notation, used in its usual way, does not do this.

If you have numeric variables, all your two-way interaction terms are second degree polynomials. In that context, it is clear that if you include interactions between a variable and itself (which you don’t have to do) it is clear that you do not want them interpreted as a second copy of the variable, nor do you want them removed. If that is what you wanted, you just would not include the self-interaction terms. But this is what R’s standard formula notation does: It interprets the interaction of a numeric variable with itself as identical with the variable, and then removes it as redundant. So formulas which include self-interaction are tautologically identical to formulas which don’t. I think that is never the behavior one would prefer in a model where all your variables are numeric.

If always removing self-interaction terms is the behavior you want – and with dummy variables it is – R’s formula notation allows you to express any pattern of interactions, including some very complex patterns, very concisely. But the only way I have found to express patterns of interaction that treat self-interactions as squared terms is to individually write out all the squared terms. This is awkward and verbose and in models with a lot of variables I think it will often lead to error. So it seems reasonable to me that in this context the interaction term should normally be the square.

So the question is, is there any straightforward way to tell R’s formula notation you want to treat self-interaction as squaring, or alternatively, is there any way of concisely expressing such expressions if you can not get the formula notation to do it.

I think this is partially a disciplinary difference. Econometrics is primarily a quasi-experimental field, and we have to take our treatments as we find them. So the treatment effect interpretation of dummy variables does not come as naturally to us.

andrewH
  • 2,281
  • 2
  • 22
  • 32

1 Answers1

4

This is not quite an answer to your question but a bit too long for a comment, so I'm posting this here. Will remove if this is not the right place.

In my opinion your question boils down to why

terms(y ~ A)

and

terms(y ~ A^2)

correspond to the same model structure.

The reason why you don't see the quadratic term is that in R's formula syntax A^2 corresponds to the interaction term A:A; this in turn is nothing but y ~ A, since a model that depends on the interaction of A with A is just a model that depends on A.

In general, to include quadratic terms you'd then have to use the "AsIs" operator I(), so

terms(y ~ I(A^2))

would give you an additive model

y = beta[0] + beta[1] * A^2

Coming back to your original question, it seems that terms does not work when you have multiple terms inside I(), so I'm not sure if there exists a simple solution to your question.


Update

I think your question stems from some confusion about what the ^ operator in R's formula syntax does.

To recap (see ?formula):

  1. The * operator denotes factor crossing: a * b is short for a + b + a:b
  2. The ^ operator indicates crossing to the specified degree

For example

y ~ (x1 + x2)^2

is the same as

y ~ (x1 + x2) * (x1 + x2)

which is the same as

y ~ x1 + x2 + x1:x2

In words (again based on explanations given in ?formula), (x1 + x2)^2 expands to a formula containing the main effects of x1 and x2 together with their second order interaction. It does not include second order (quadratic) main effects, and does not expand to x1 + I(x1^2) + x2 + I(x2^2) + x1:x2.

By the same argument y ~ x1^2 expands to y ~ x1 * x1 which in turn expands to y ~ x1. Again, no quadratic main effect.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • I don't understand why the interaction of A with A is A and not A^2. I mean, if A is a dummy variable then A^2 is A, but all my variables are numeric. Is there some second set of operators for use with numeric variables? – andrewH Nov 19 '19 at 01:16
  • `A^2` is the interaction of `A` with itself (it's *not* a quadratic term); in an additive linear model `y ~ A^2` this is then (trivially) the same as `y ~ A`. You need `y ~ I(A^2)` to include a quadratic term. This applies to categorical and continuous variables alike. I don't understand what you mean by *"Is there some second set of operators for use with numeric variables?"* – Maurits Evers Nov 19 '19 at 01:22
  • Upon re-reading your comment and post, I guess your question is why the interaction term `A:A` (which is the same as `A^2`) does not get translated into a quadratic term. Is that what you're asking? If so, I would suggest editing your original post to highlight this point, perhaps with a more minimal code example (e.g. using `y ~ A^2` vs. `y ~ I(A^2)` vs. `y ~ A`). – Maurits Evers Nov 19 '19 at 01:39
  • @andrewH I've added an update in an attempt to clarify on some of the previous points I made. Please take a look. – Maurits Evers Nov 20 '19 at 00:11
  • Thank you so much for your extremely thoughtful and helpful remarks. I have tried to clarify my question to explain why one might want to treat self-interaction as squaring, and to move away from the original insistence that this must be done via the standard formula notation. So I can not take your current response as an answer, but I have given you an up vote. – andrewH Dec 04 '19 at 18:25