Flexible comparison of formula equivalence in R?

Question

Consider these various R formulas with interactions:

x ~ a + b + c + d + c:d + a:b
x ~ c + a + b + d + a:b + d:c
x ~ a + b + c + d + c * d + a * b
x ~ a * b + c * d
x ~ b * a + c * d

For purposes of something like a linear model, these are all equivalent. Let's say I had a big set of formulas, and I wanted to compare there were any duplicates, but there might be non-obvious duplicates like the above. Is there a simple way to do this kind of comparison?

There are three challenges:

Have to remove redundancies (d + c * d is equivalent to c * d)
Have to be able to match elements in different orders (a + b same as b + a)
Have to be able to match commuted interactions (c:d is the same as d:c)

Just a terms() call with some sorting doesn't seem to get at it, mostly because of the last one.

Here's how I worked it out so far (written as a functional sequence for ease of reading):

# uses tidyverse

get.terms <- {
  . %>%              
    terms %>%                  # use terms to get the parts
    attr("term.labels") %>%    # character vector of elements
    str_split(":") %>%         # separate interaction terms (makes list)                        
    map_chr(                   # go through each list item
      ~.x %>% 
      sort %>%                 # if multiples (interaction), sort
      paste0(collapse = ":")   # combine back 
    ) %>%                      # output (now standardized) term list
    sort                       # sort the term list for comparison
  }

# Which gives:
get.terms(x ~ a + b + c + d + c:d + a:b)
get.terms(x ~ c + a + b + d + a:b + d:c)
get.terms(x ~ a + b + c + d + c * d + a * b)
get.terms(x ~ a * b + c * d)
get.terms(x ~ b * a + c * d)

# so you can test:
all.equal(get.terms(x ~ b * a + c * d), get.terms(x ~ c + a + b + d + a:b + d:c))

# would have to add more for this, though:
all.equal(get.terms(foo ~ b * a + c * d), get.terms(bar ~ c + a + b + d + a:b + d:c))

But this seems hacky for such a fundamental part of R.

I realize that you could probably shorten this a bit with a list element comparison nearer the end, but the extra steps are intentional as the idea is to be able to constructing a standardized human-readable formula notation, too. It's more that the whole process, especially the interaction term flips, seems like it shouldn't be necessary

Anyone know an easier or more canonical way to do this?

Bonus points if it can incorporate potential left-hand-side differences as well.

Double bonus if it can output a standardized formula format (or string equivalent).

Possible duplicate of [expanding factor interactions within a formula](https://stackoverflow.com/questions/11595392/expanding-factor-interactions-within-a-formula) — Brigadeiro, Aug 09 '19 at 18:08
This is not a duplicate, but elements from there could probably be used in an answer. — Roman Luštrik, Aug 09 '19 at 18:10

score 4 · Answer 1 · answered Aug 09 '19 at 18:15

4

I wonder if this idea could be expanded to something more general.

frm1 <- x ~ a + b + c + d + c:d + a:b
frm2 <- x ~ c + a + b + d + a:b + d:c

identical(
  sort(attr(terms.formula(frm1), "term.labels")),
  sort(attr(terms.formula(frm2), "term.labels"))
)

[1] TRUE

answered Aug 09 '19 at 18:15

Roman Luštrik

69,533
24
154
197

2

The idea just with `frm1` and `frm5` above will fail. A `strsplit` would be needed, which is not so difficult to code. – Rui Barradas Aug 09 '19 at 18:36
1

aha. `d:c` gets rearranged to `c:d` because `c` and `d` have previously appeared in the formula. – Ben Bolker Aug 09 '19 at 18:49
This works as the terms list orders the interaction terms display by how they are initially presented as main effects. In both your examples, c comes before d. But it doesn't generalize if those get switched. Consider these two additional formulas: `frm3 <- x ~ a + b + d + a:b + d*c` and `frm4 <- x ~ a + b + d + c + a:b + d:c`. (Apologies, hadn't refreshed to see the comments above come in.) – Joe Aug 09 '19 at 18:56
2

Rui, isn't a strsplit basically what I originally presented? (just using the *stringr* version) – Joe Aug 09 '19 at 18:58

Flexible comparison of formula equivalence in R?

1 Answers1