Find and keep duplicated items in each column in R

Question

Is there any way I can use some like tidyverse's add_count() %>% filter() or distinct() or alternatively janitor's get_dupes() to find and keep the duplicated items of each column. No need to compare items of different columns with each other, each column needs to be considered separately.

data1 <-tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   1,
  "c",   2,
  "c",   3
)

Expected Output would be

colA colB

c   1

missuse · Answer 1 · 2021-06-04T07:47:19.830

You can try with map_dfc which will map over the columns and return a data frame by column binding the outputs

library(tidyverse)
data1  %>% 
  map_dfc(~.x[duplicated(.x)])

# A tibble: 1 x 2
  colA   colB
  <chr> <dbl>
1 c         1

However this will result in unwanted behavior when each column has a different amount of duplicates due to recycling (when applying an operation to two vectors that requires them to be the same length - like column bind, R automatically repeats the shorter one, until it is long enough to match the longer one).

data1 <-tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   1,
  "c",   2,
  "c",   3,
  "d",   1,
) 

data1  %>% 
  map_dfc( ~.x[duplicated(.x)])

# A tibble: 2 x 2
  colA   colB
  <chr> <dbl>
1 c         1
2 c         1

here colA has been recycled to match the length of colB. In such a case you are better off returning a list with map

data1  %>% 
  map( ~.x[duplicated(.x)])
#output
$colA
[1] "c"

$colB
[1] 1 1

score 0 · Answer 2 · answered Jun 04 '21 at 07:19

0

In baseR

dupicatedList <- lapply(data1, function(columnValues) {
  unique(columnValues[duplicated(columnValues)])
})

answered Jun 04 '21 at 07:19

Jonas

1,760
1
3
12

score 0 · Answer 3 · answered Jun 04 '21 at 07:46

0

A base R option

> list2DF(Map(function(x) x[duplicated(x)], data1))
  colA colB
1    c    1

answered Jun 04 '21 at 07:46

ThomasIsCoding

96,636
9
24
81

Find and keep duplicated items in each column in R

3 Answers3