4

I am generating a big list of factors with different levels, and I want to be able to detect when two of them define the same partition. For example, I want to detect all of the following as equivalent to each other:

x1 <- factor(c("a", "a", "b", "b", "c", "c", "a", "a"))
x2 <- factor(c("c", "c", "b", "b", "a", "a", "c", "c"))
x3 <- factor(c("x", "x", "y", "y", "z", "z", "x", "x"))
x4 <- factor(c("a", "a", "b", "b", "c", "c", "a", "a"), levels=c("b", "c", "a"))

What is the best way to do this?

tonytonov
  • 25,060
  • 16
  • 82
  • 98
Ryan C. Thompson
  • 40,856
  • 28
  • 97
  • 159

1 Answers1

5

I guess you want to establish that a two-way tabulation has the same number of populated levels as a one way classification. The default setting in interaction is to represent all levels even if not populated but setting drop=TRUE changes it to suit your purpose:

> levels (interaction(x1,x2, drop=TRUE) )
[1] "c.a" "b.b" "a.c"
> length(levels(x1) ) == length(levels(interaction(x1,x2,drop=TRUE) ) )
[1] TRUE

The generalization would look at all( <the 3 necessary logical comparisons> ):

 all( length(levels(x1) ) == length(levels(interaction(x1,x2,drop=TRUE) ) ),
      length(levels(x1) ) == length(levels(interaction(x1,x3,drop=TRUE) ) ),
      length(levels(x1) ) == length(levels(interaction(x1,x4,drop=TRUE) ) ) )
#[1] TRUE
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    I find it useful to visualize this method with `table(x1, x2)`. You can see that each column (and row) has only a single non-zero entry. – bdemarest Sep 29 '12 at 00:28
  • 1
    To use `table(x1,x2)` in a programmatic fashion you would need something like `sum(table(x1,x2) != 0 )`. – IRTFM Sep 29 '12 at 00:51
  • `interaction` can be slow for large vectors, which can be sped up by using `paste` instead. – Empiromancer Mar 02 '17 at 22:28
  • I'm always willing to learn new things, but I do so better by seeing well constructed demonstrations. – IRTFM Mar 03 '17 at 01:15