How do I filter out combinations where the order doesn't matter?

Question

I have a tibble of combinations of 1,2, and 3:

> tidyr::crossing(X1 = c(1,2,3), X2 = c(1,2,3))
# A tibble: 9 × 2
     X1    X2
  <dbl> <dbl>
1     1     1
2     1     2
3     1     3
4     2     1
5     2     2
6     2     3
7     3     1
8     3     2
9     3     3

And I would like to filter out row 4, 6, 7, and 8 in this example, because the order does not matter. That is to say, (1, 2) is the same as (2, 1).

Is there a way to generate combinations like this, or filter out rows if a dataframe has already been generated?

Will vectors be the same every time, or are you asking more generally? For example, in your post `X1 = X2`, if this is always the case, you are using the wrong approach (i.e. generating the Cartesian product). All you need is combinations with repetition, which there are many packages that provide this functionality. See the canonical question for R here: [How to generate permutations and combinations in R](https://stackoverflow.com/q/22569176/4408538) — Joseph Wood, Mar 11 '23 at 01:20
If instead you are asking more generally and each `X_i` could be different, then any approach provided thus far will become very inefficient very quickly as the sheer number of values to filter will become very large. — Joseph Wood, Mar 11 '23 at 01:23
There is also this post [Non-redundant version of expand.grid](https://stackoverflow.com/q/17171148/4408538). You will see some of the same approaches posted here. — Joseph Wood, Mar 11 '23 at 01:34

Maël · Accepted Answer · 2023-03-10T11:12:54.740

You can filter rows where X1 is lower or equal than X2:

library(dplyr)
library(tidyr)
crossing(X1 = c(1,2,3), X2 = c(1,2,3)) %>% 
  filter(X1 <= X2)

#      X1    X2
# 1     1     1
# 2     1     2
# 3     1     3
# 4     2     2
# 5     2     3
# 6     3     3

For a more generalized solution, check gtools::combinations with repeats = TRUE:

gtools::combinations(3, 3, c(1, 2, 3), repeats = TRUE)

#       [,1] [,2] [,3]
#  [1,]    1    1    1
#  [2,]    1    1    2
#  [3,]    1    1    3
#  [4,]    1    2    2
#  [5,]    1    2    3
#  [6,]    1    3    3
#  [7,]    2    2    2
#  [8,]    2    2    3
#  [9,]    2    3    3
# [10,]    3    3    3

Or with is.unsorted rowwise:

crossing(X1 = c(1,2,3), X2 = c(1,2,3), X3 = c(1,2,3)) %>% 
  rowwise() %>% 
  filter(!is.unsorted(across(X1:X3)))

Made simpler in base R:

df <- expand.grid(X1 = c(1,2,3), X2 = c(1,2,3), X3 = c(1,2,3))
df[!apply(df, 1, is.unsorted), ]

benson23 · Answer 2 · 2023-03-10T11:37:25.417

1

We can combine the min and max values of the two columns together, then slice one entry out.

This works for both numeric and character columns.

library(dplyr)

df %>% 
  mutate(tmp = paste0(pmin(X1, X2), pmax(X1, X2))) %>% 
  slice_head(n = 1, by = tmp) %>% 
  select(-tmp)

# A tibble: 6 × 2
     X1    X2
  <dbl> <dbl>
1     1     1
2     1     2
3     1     3
4     2     2
5     2     3
6     3     3

edited Mar 10 '23 at 11:37

answered Mar 10 '23 at 11:30

benson23

16,369
9
19
38

Darren Tsai · Answer 3 · 2023-03-10T12:59:39.280

You can repeat the vector twice and use combn:

unique(t(combn(rep(1:3, each = 2), 2)))

#      [,1] [,2]
# [1,]    1    1
# [2,]    1    2
# [3,]    1    3
# [4,]    2    2
# [5,]    2    3
# [6,]    3    3

Note that the repeated vector passed into combn must be increasing, i.e. you have to set each = 2 rather than times = 2; otherwise,

unique(t(combn(rep(1:3, times = 2), 2)))

#      [,1] [,2]
# [1,]    1    2
# [2,]    1    3
# [3,]    1    1
# [4,]    2    3
# [5,]    2    1
# [6,]    2    2
# [7,]    3    1
# [8,]    3    2
# [9,]    3    3

score 0 · Answer 4 · answered Mar 10 '23 at 13:03

0

I hope this trick applies to your case

x <- 1:3
data.frame(
  X1 = rep(x, rev(seq_along(x))),
  X2 = sequence(rev(seq_along(x)), x)
)

which gives

answered Mar 10 '23 at 13:03

ThomasIsCoding

96,636
9
24
81

How do I filter out combinations where the order doesn't matter?

4 Answers4