how to intersect elements of one column along with group_by in R

Question

Lets assume, my data is like

  group_id    col1
1        1     A,B
2        1     B,C
3        2     A,C
4        2     B,D
5        3     A,D
6        3 A,B,C,D

I would like to summarise/mutate col1 where its elements are intersected within same group (over group_id). My required output is like (if summarising)

  group_id col1
1        1    B
2        2 <NA>
3        3  A,D

or like this (if mutating)

  group_id col1
1        1    B
2        1    B
3        2 <NA>
4        2 <NA>
5        3  A,D
6        3  A,D

I can easily create a union by using function toString but scratching my head for how to have common elements in the output. Basically intersect requires at least two arguments and is therefore not working here.

dput(df) is as under

df <-  structure(list(group_id = c(1L, 1L, 2L, 2L, 3L, 3L), col1 = c("A,B", 
"B,C", "A,C", "B,D", "A,D", "A,B,C,D")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

score 1 · Accepted Answer · answered Mar 05 '21 at 07:52

1

You can split col1 on comma and use Reduce + intersect to get common values in each group_id.

library(dplyr)
df %>%
  group_by(group_id) %>%
  summarise(col1 = toString(Reduce(intersect, strsplit(col1, ','))))

#  group_id col1  
#*    <int> <chr> 
#1        1 "B"   
#2        2 ""    
#3        3 "A, D"

answered Mar 05 '21 at 07:52

Ronak Shah

377,200
20
156
213

Thanks for the response @Ronak. It is working as expected. Let me check other answers before accepting one. – AnilGoyal Mar 05 '21 at 08:04

score 1 · Answer 2 · answered Mar 05 '21 at 07:57

Would this work:

library(dplyr)
library(tidyr)
df %>% separate_rows(col1) %>% 
   group_by(group_id, col1) %>% filter(n()>1) %>% 
   distinct() %>% group_by(group_id) %>% summarise(col1 = toString(col1)) %>% 
   right_join(df %>% select(group_id) %>% distinct()) %>% 
   arrange(group_id)
`summarise()` ungrouping output (override with `.groups` argument)
Joining, by = "group_id"
# A tibble: 3 x 2
  group_id col1 
     <int> <chr>
1        1 B    
2        2 NA   
3        3 A, D

score 1 · Answer 3 · answered Mar 05 '21 at 07:58

One option using dplyr and tidyr could be:

df %>%
 separate_rows(col1) %>%
 count(group_id, col1) %>%
 group_by(group_id) %>%
 summarise(col1 = if_else(all(n == 1), NA_character_, paste(col1[n == 2], collapse = ",")))

  group_id col1 
     <int> <chr>
1        1 B    
2        2 <NA> 
3        3 A,D

how to intersect elements of one column along with group_by in R

3 Answers3