Multiple duplicates (2 times, 3 times,...) in R

Question

After searching for a while, I know that this question has not been answered yet. Assume that I have the following vector

v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")

How do I find those values having more than 1 duplicates

(should be "c","c","c", "d", "d", "d", "d")

and more than 2 duplicates

(should be "d", "d", "d", "d")

Function duplicated(v) only returns values having duplicates.

@AlexA. You are right. But, I am not sure whether the duplicates are consecutive or not. — akrun, Apr 30 '15 at 16:36
Suppose if the vector is `c("a", "c", "c", "c", "d", "d", "d", "d", "c")`, would the results include the last `c` or not. — akrun, Apr 30 '15 at 16:40
@AlexA. But, his example didn't include that type, so I was confused — akrun, Apr 30 '15 at 16:42
@DuyBui: Do you want to list the elements the number of times they occur, e.g. `"d" "d" "d" "d"`, or do you just want a list of the elements that are duplicated that many times, e.g. `"d"`? — Alex A., Apr 30 '15 at 16:42
Hi Alex, I want the list of elements. It is better to point out the index of elements. Such as (4, 5, 6, 7, 8, 9, 10) for more than 1 duplicate — Duy Bui, May 01 '15 at 08:50

score 7 · Answer 1 · answered Apr 30 '15 at 16:33

7

You can generate a table() and then check which elements of v are part of the relevant subset of the table, e.g.

R> v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
R> tab <- table(v)
R> tab
v
a b c d 
1 2 3 4 
R> v[v %in% names(tab[tab > 2])]
[1] "c" "c" "c" "d" "d" "d" "d"
R> v[v %in% names(tab[tab > 3])]
[1] "d" "d" "d" "d"

answered Apr 30 '15 at 16:33

Achim Zeileis

15,710
1
39
49

Because this question came up: In case `v` is numeric you may want an additional `as.numeric(names(...))` to transform the table names back to numeric. But as pointed out in the comment below...it appears to work even without :-) – Achim Zeileis Apr 30 '15 at 16:44
Actually just realized that what I said was false. Surprisingly `1 %in% c("1", "2")` returns `TRUE`, so `as.numeric()` isn't needed even if `v` is numeric. – Alex A. Apr 30 '15 at 16:46
Note that if the OP simply wants a list of duplicated elements rather than a repeated list of duplicated elements, you can simply use `names(tab[tab > 1])` (likewise for 2, 3, ...) Or you can just wrap `unique()` around what you already have. (Waiting for the OP's confirmation on the desired form of the output.) – Alex A. Apr 30 '15 at 16:51
That's not the way I read the question...but as you say: it's easy to tweak the example if necessary. – Achim Zeileis Apr 30 '15 at 16:56

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2015-04-30T18:15:35.793

5

I would use ave to write a simple function like this:

myFun <- function(vector, thresh) {
  ind <- ave(rep(1, length(vector)), vector, FUN = length)
  vector[ind > thresh + 1] ## added "+1" to match your terminology
}

Here it is applied to "v":

myFun(v, 1)
# [1] "c" "c" "c" "d" "d" "d" "d"
myFun(v, 2)
# [1] "d" "d" "d" "d"

Of course, there is always "data.table":

as.data.table(v)[, N := .N, by = v][N > 1 + 1]$v
# [1] "c" "c" "c" "d" "d" "d" "d"
as.data.table(v)[, N := .N, by = v][N > 2 + 1]$v
# [1] "d" "d" "d" "d"

edited Apr 30 '15 at 18:15

answered Apr 30 '15 at 16:35

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

How should one use this on a matrix; to get the rows that are duplicated only 3 times for example? – Ole Petersen Feb 11 '19 at 14:43

Multiple duplicates (2 times, 3 times,...) in R

2 Answers2