1

In the tidyverse I would like to mutate/expand a string vector so that all possible combinations of elements (separated by " & ") are reported, one for each line.

I tried decomposing my function using t(combn(unlist(strsplit(x, " & ")),2)), but fails when there is no " & ".

In the example:

  • "A" remains "A" (or becomes "A & A")
  • "A & B" remains "A & B"
  • "C & D & E" becomes "C & D", "C & E" and "D & E" in three different rows

Note (1): I cannot predict the number of combinations in advance "A & B & C & D..."

Note (2): Order is not important (i.e. "C & D" == "D & C")

Note (3): This would feed into a separate function and be used in a igraph application.

Thanks in advance.

data <- data.frame(names=c(1:3), combinations=c("A","A & B","C & D & E"))

  names combinations
1     1            A
2     2        A & B
3     3    C & D & E

expected <- data.frame(projects=c(1,2,3,3,3), combinations=c("A","A & B","C & D","C & E","D & E"))

  projects combinations
1        1            A
2        2        A & B
3        3        C & D
4        3        C & E
5        3        D & E
MCS
  • 1,071
  • 9
  • 23

3 Answers3

3

You can use combn to create combinations within each name :

library(dplyr)
library(tidyr)

data %>%
  separate_rows(combinations, sep = ' & ') %>%
  group_by(names) %>%
  summarise(combinations = if(n() > 1) 
          combn(combinations, 2, paste0, collapse = ' & ') else combinations) %>%
  ungroup

#  names combinations
#  <int> <chr>       
#1     1 A           
#2     2 A & B       
#3     3 C & D       
#4     3 C & E       
#5     3 D & E       
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

A data.table option

setnames(
  setDT(data)[
    ,
    {
      s <- unlist(strsplit(combinations, " & "))
      if (length(s) == 1) s else combn(s, 2, paste0, collapse = " & ")
    },
    names
  ], "V1", "combinations"
)[]

gives

   names combinations
1:     1            A
2:     2        A & B
3:     3        C & D
4:     3        C & E
5:     3        D & E
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

Using data.table method

library(splitstackshape)
setnames(cSplit(data, 'combinations', sep=' & ', 'long', type.convert = FALSE)[,  
     if(.N > 1) combn(combinations, 2, FUN = paste, collapse = ' & ') else
         combinations, names], 'V1', 'combinations')[]
#   names combinations
#1:     1            A
#2:     2        A & B
#3:     3        C & D
#4:     3        C & E
#5:     3        D & E
akrun
  • 874,273
  • 37
  • 540
  • 662