3

Is there any way I can use some like tidyverse's add_count() %>% filter() or distinct() or alternatively janitor's get_dupes() to find and keep the duplicated items of each column. No need to compare items of different columns with each other, each column needs to be considered separately.

data1 <-tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   1,
  "c",   2,
  "c",   3
) 

Expected Output would be

colA colB

c   1   

    
mike
  • 49
  • 4

3 Answers3

3

You can try with map_dfc which will map over the columns and return a data frame by column binding the outputs

library(tidyverse)
data1  %>% 
  map_dfc(~.x[duplicated(.x)])

# A tibble: 1 x 2
  colA   colB
  <chr> <dbl>
1 c         1

However this will result in unwanted behavior when each column has a different amount of duplicates due to recycling (when applying an operation to two vectors that requires them to be the same length - like column bind, R automatically repeats the shorter one, until it is long enough to match the longer one).

data1 <-tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   1,
  "c",   2,
  "c",   3,
  "d",   1,
) 

data1  %>% 
  map_dfc( ~.x[duplicated(.x)])

# A tibble: 2 x 2
  colA   colB
  <chr> <dbl>
1 c         1
2 c         1

here colA has been recycled to match the length of colB. In such a case you are better off returning a list with map

data1  %>% 
  map( ~.x[duplicated(.x)])
#output
$colA
[1] "c"

$colB
[1] 1 1
missuse
  • 19,056
  • 3
  • 25
  • 47
0

In baseR

dupicatedList <- lapply(data1, function(columnValues) {
  unique(columnValues[duplicated(columnValues)])
})
Jonas
  • 1,760
  • 1
  • 3
  • 12
0

A base R option

> list2DF(Map(function(x) x[duplicated(x)], data1))
  colA colB
1    c    1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81