2

Suppose I had the following matrix:

matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3)

Result:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2
[3,]    2    2    2
[4,]    1    1    1

How can I filter/subset this matrix by whether or not each row has duplicate values? For example, in this case, I would only want to keep row 1 and row 2.

Any thoughts would be much appreciated!

eyio
  • 337
  • 3
  • 14

3 Answers3

4

Try this: (I suspect will be faster than any apply approach)

 mat[ rowSums(mat == mat[,1])!=ncol(mat) , ]
# ---with your object---
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Yep - 0.03 seconds for a 1M row, 4 col matrix over here. Impressive. – thelatemail Jun 18 '15 at 23:46
  • 1
    Vectorized functions like `rowSums` and `==` beat `apply/loops every time. – IRTFM Jun 18 '15 at 23:49
  • Just realised this returns a positive where there is for instance `c(1,2,1)` in a row. – thelatemail Jun 19 '15 at 00:02
  • This is great! I knew there was a faster way – Pierre L Jun 19 '15 at 00:02
  • @thelatemail: Yes. That was how it was intended. That's what I understood the request to be, but changing the test could make it return a different set of rows. – IRTFM Jun 19 '15 at 00:16
  • @BondedDust thanks! this approach works wonderfully. I have modified it to this `mat[rowSums(mat == mat[,1])==1 & rowSums(mat == mat[,2])==1, ]` so that it gives me rows with all distinct values – eyio Jun 19 '15 at 00:17
  • After testing, this code only works for thin data sets. You would have to repeat the indexing ncol minus one times. I take my vote away! :) – Pierre L Jun 19 '15 at 00:55
2
indx <- apply(m, 1, function(x) !any(duplicated(x)))
m[indx, ]
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2

This second one is just for fun. You can follow the logic to see why it works.

indx2 <- apply(m, 1, function(x) length(unique(x)) == length(x))
m[indx2,]
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2
Pierre L
  • 28,203
  • 6
  • 47
  • 69
2

Here is my approach just a little bit shorter that use the anyDuplicated function, which should be faster.

mat[!apply(mat, 1, anyDuplicated), ]
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2
SabDeM
  • 7,050
  • 2
  • 25
  • 38