Identify row number which have duplicate values in multiple columns in R?

Question

My data set looks like:

id  1   2   3   4   5
v1  1   1   0   13  14
v2  1   2   0   13  2
v3  1   12  0   13  5

I have transposed it while representing here, otherwise 1st column is the column name of the dataset

Now, I want to identify id's which have duplicate values in all the columns from v1 to v3 and then flag those id's.

So output will look like:

id  1   2   3   4   5
v1  1   1   0   13  14
v2  1   2   0   13  2
v3  1   12  0   13  5
flag 1  0   1   1   0

I tried various things but not able to get this result. I can do this by doing sum and applying loop but that would take a lot of time since my data set is very huge.

I will be really grateful if you can help me with some simple approach to solve this problem.

score 2 · Answer 1 · edited Mar 29 '16 at 10:50

2

We can use rowSums

df1$flag <- +(rowSums(df1[,2]==as.matrix(df1[-1]))==(ncol(df1)-1))
df1$flag
#[1] 1 0 1 1 0

Or a slightly faster option

 +(Reduce(`&`, lapply(df1[-1],`==`, df1[,2])))

edited Mar 29 '16 at 10:50

RHertel

23,412
5
38
64

answered Mar 29 '16 at 10:35

akrun

874,273
37
540
662

score 1 · Answer 2 · answered Mar 29 '16 at 10:46

One possibility consists in checking whether there is any variance of the values within each row:

df1$flag <- +!apply(df1[-1],1,var)
#  id v1 v2 v3 flag
#1  1  1  1  1    1
#2  2  1  2 12    0
#3  3  0  0  0    1
#4  4 13 13 13    1
#5  5 14  2  5    0

Identify row number which have duplicate values in multiple columns in R?

2 Answers2