1

I want to identify duplicated characters in grouped lists

Consider the following example data frame:

ID<-c("Carl", "Carl","Carl","Peter","Peter","Peter")
Question<-c("need","need","need","dyadic","dyadic","dyadic")
V1<-c("A1","A2","C0","A3","A3","A1")
df<-data.frame(ID,Question,V1)

I am using the following code to list V1 characters per group

df |>
    summarize(present_codes = list(V1), .by = c(ID, Question))

And would like the output to be a new column identifying the duplicated characters ('duplicated_codes') within each grouped list as below:

ID Question present_codes duplicated_codes
Carl need c("A1", "A2", "C0") character(0)
Peter dyadic c("A3", "A3", "A1") c("A3")

I am trying to use a combination of mutate(), lapply() and x[duplicated(x)], but am getting the error message '"FUN" is missing', when running the line below - though x[duplicated(x)] works on single vectors. I am very new to Tidyverse and lapply language, so I am probably just making some simple error. The actual dataset has >40,000 rows.

 |>
    mutate(duplicated_codes=lapply(x=present_codes,x[duplicated(x)]))

Thanks a lot in advance!

Bettina
  • 35
  • 6

1 Answers1

3

One option would be to first identify duplicates:

dupes <- df %>% 
  filter(duplicated(V1), .by = c(ID, Question)) %>%
  rename(duplicated_codes = V1)

Then building off of your existing code, simply add a dplyr::left_join statement:

df %>%
  summarize(present_codes = list(V1), .by = c(ID, Question)) %>%
  left_join(dupes)

Output:

     ID Question present_codes duplicated_codes
1  Carl     need    A1, A2, C0             <NA>
2 Peter   dyadic    A3, A3, A1               A3

Or all in one go, per the comment from @Ben:

df |> summarise(present_codes = list(V1), 
                duplicated_codes = list(V1[duplicated(V1)]), 
                .by = c(ID, Question))
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Could you consider simplifying in a single `summarise` statement, such as: `df |> summarise(present_codes = list(V1), duplicated_codes = list(V1[duplicated(V1)]), .by = c(ID, Question))`? – Ben Jun 25 '23 at 01:25
  • @Ben Thanks! I included your suggestion in the edit. – jpsmith Jun 25 '23 at 17:09