5

I wanted to get all unique pairwise combinations of a unique string column of a dataframe using the tidyverse (ideally).

Here is a dummy example:

library(tidyverse)

a <- letters[1:3] %>% 
        tibble::as_tibble()
a
#> # A tibble: 3 x 1
#>   value
#>   <chr>
#> 1     a
#> 2     b
#> 3     c

tidyr::crossing(a, a) %>% 
    magrittr::set_colnames(c("words1", "words2"))
#> # A tibble: 9 x 2
#>   words1 words2
#>    <chr>  <chr>
#> 1      a      a
#> 2      a      b
#> 3      a      c
#> 4      b      a
#> 5      b      b
#> 6      b      c
#> 7      c      a
#> 8      c      b
#> 9      c      c

Is there a way to remove 'duplicate' combinations here. That is have the output be the following in this example:

# A tibble: 9 x 2
#>   words1 words2
#>    <chr>  <chr>
#> 1      a      b
#> 2      a      c
#> 3      b      c

I was hoping there would be a nice purrr::map or filter approach to pipe into to complete the above.

EDIT: There are similar questions to this one e.g. here, marked by @Sotos. Here I am specifically looking for tidyverse (purrr, dplyr) ways to complete the pipeline I have setup. The other answers use various other packages that I do not want to include as dependencies.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
user4687531
  • 1,021
  • 15
  • 30
  • @Sotos - I read that question already. I am specifically asking this question to use tidyverse packages and in particular purrr::map solutions. Please remove the duplication flag – user4687531 Sep 29 '17 at 18:11

2 Answers2

14

wish there was a better way, but I usually use this...

library(tidyverse)

df <- tibble(value = letters[1:3])

df %>% 
  expand(value, value1 = value) %>% 
  filter(value < value1)

# # A tibble: 3 x 2
#   value value1
#   <chr> <chr> 
# 1 a     b     
# 2 a     c     
# 3 b     c  
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
  • I am trying to solve the same issue but now getting the problem `Error: Column name `value` must not be duplicated.` when I use your solution :-( – user2716568 Aug 28 '20 at 11:58
  • edited to work with current tidyverse version (tidyverse recently implemented new/stricter column name checking) – CJ Yetman Aug 29 '20 at 13:08
1

Something like this?

tidyr::crossing(a, a) %>% 
  magrittr::set_colnames(c("words1", "words2")) %>%
  rowwise() %>%
  mutate(words1 = sort(c(words1, words2))[1],       # sort order of words for each row
         words2 = sort(c(words1, words2))[2]) %>%
  filter(words1 != words2) %>%                      # remove word combinations with itself
  unique()                                          # remove duplicates

# A tibble: 3 x 2
  words1 words2
   <chr>  <chr>
1      a      b
2      a      c
3      b      c
Z.Lin
  • 28,055
  • 6
  • 54
  • 94