2

I have a tibble of combinations of 1,2, and 3:

> tidyr::crossing(X1 = c(1,2,3), X2 = c(1,2,3))
# A tibble: 9 × 2
     X1    X2
  <dbl> <dbl>
1     1     1
2     1     2
3     1     3
4     2     1
5     2     2
6     2     3
7     3     1
8     3     2
9     3     3

And I would like to filter out row 4, 6, 7, and 8 in this example, because the order does not matter. That is to say, (1, 2) is the same as (2, 1).

Is there a way to generate combinations like this, or filter out rows if a dataframe has already been generated?

Slash
  • 501
  • 2
  • 9
  • Will vectors be the same every time, or are you asking more generally? For example, in your post `X1 = X2`, if this is always the case, you are using the wrong approach (i.e. generating the Cartesian product). All you need is combinations with repetition, which there are many packages that provide this functionality. See the canonical question for R here: [How to generate permutations and combinations in R](https://stackoverflow.com/q/22569176/4408538) – Joseph Wood Mar 11 '23 at 01:20
  • If instead you are asking more generally and each `X_i` could be different, then any approach provided thus far will become very inefficient very quickly as the sheer number of values to filter will become very large. – Joseph Wood Mar 11 '23 at 01:23
  • There is also this post [Non-redundant version of expand.grid](https://stackoverflow.com/q/17171148/4408538). You will see some of the same approaches posted here. – Joseph Wood Mar 11 '23 at 01:34

4 Answers4

3

You can filter rows where X1 is lower or equal than X2:

library(dplyr)
library(tidyr)
crossing(X1 = c(1,2,3), X2 = c(1,2,3)) %>% 
  filter(X1 <= X2)

#      X1    X2
# 1     1     1
# 2     1     2
# 3     1     3
# 4     2     2
# 5     2     3
# 6     3     3

For a more generalized solution, check gtools::combinations with repeats = TRUE:

gtools::combinations(3, 3, c(1, 2, 3), repeats = TRUE)

#       [,1] [,2] [,3]
#  [1,]    1    1    1
#  [2,]    1    1    2
#  [3,]    1    1    3
#  [4,]    1    2    2
#  [5,]    1    2    3
#  [6,]    1    3    3
#  [7,]    2    2    2
#  [8,]    2    2    3
#  [9,]    2    3    3
# [10,]    3    3    3

Or with is.unsorted rowwise:

crossing(X1 = c(1,2,3), X2 = c(1,2,3), X3 = c(1,2,3)) %>% 
  rowwise() %>% 
  filter(!is.unsorted(across(X1:X3)))

Made simpler in base R:

df <- expand.grid(X1 = c(1,2,3), X2 = c(1,2,3), X3 = c(1,2,3))
df[!apply(df, 1, is.unsorted), ]
Maël
  • 45,206
  • 3
  • 29
  • 67
1

We can combine the min and max values of the two columns together, then slice one entry out.

This works for both numeric and character columns.

library(dplyr)

df %>% 
  mutate(tmp = paste0(pmin(X1, X2), pmax(X1, X2))) %>% 
  slice_head(n = 1, by = tmp) %>% 
  select(-tmp)

# A tibble: 6 × 2
     X1    X2
  <dbl> <dbl>
1     1     1
2     1     2
3     1     3
4     2     2
5     2     3
6     3     3
benson23
  • 16,369
  • 9
  • 19
  • 38
1

You can repeat the vector twice and use combn:

unique(t(combn(rep(1:3, each = 2), 2)))

#      [,1] [,2]
# [1,]    1    1
# [2,]    1    2
# [3,]    1    3
# [4,]    2    2
# [5,]    2    3
# [6,]    3    3

Note that the repeated vector passed into combn must be increasing, i.e. you have to set each = 2 rather than times = 2; otherwise,

unique(t(combn(rep(1:3, times = 2), 2)))

#      [,1] [,2]
# [1,]    1    2
# [2,]    1    3
# [3,]    1    1
# [4,]    2    3
# [5,]    2    1
# [6,]    2    2
# [7,]    3    1
# [8,]    3    2
# [9,]    3    3
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
0

I hope this trick applies to your case

x <- 1:3
data.frame(
  X1 = rep(x, rev(seq_along(x))),
  X2 = sequence(rev(seq_along(x)), x)
)

which gives

  X1 X2
1  1  1
2  1  2
3  1  3
4  2  2
5  2  3
6  3  3
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81