0

My data set looks like:

id  1   2   3   4   5
v1  1   1   0   13  14
v2  1   2   0   13  2
v3  1   12  0   13  5

I have transposed it while representing here, otherwise 1st column is the column name of the dataset

Now, I want to identify id's which have duplicate values in all the columns from v1 to v3 and then flag those id's.

So output will look like:

id  1   2   3   4   5
v1  1   1   0   13  14
v2  1   2   0   13  2
v3  1   12  0   13  5
flag 1  0   1   1   0

I tried various things but not able to get this result. I can do this by doing sum and applying loop but that would take a lot of time since my data set is very huge.

I will be really grateful if you can help me with some simple approach to solve this problem.

nicola
  • 24,005
  • 3
  • 35
  • 56

2 Answers2

2

We can use rowSums

df1$flag <- +(rowSums(df1[,2]==as.matrix(df1[-1]))==(ncol(df1)-1))
df1$flag
#[1] 1 0 1 1 0

Or a slightly faster option

 +(Reduce(`&`, lapply(df1[-1],`==`, df1[,2])))
RHertel
  • 23,412
  • 5
  • 38
  • 64
akrun
  • 874,273
  • 37
  • 540
  • 662
1

One possibility consists in checking whether there is any variance of the values within each row:

df1$flag <- +!apply(df1[-1],1,var)
#  id v1 v2 v3 flag
#1  1  1  1  1    1
#2  2  1  2 12    0
#3  3  0  0  0    1
#4  4 13 13 13    1
#5  5 14  2  5    0
RHertel
  • 23,412
  • 5
  • 38
  • 64