Test for Duplicate Quickly in Matlab Array

Question

I have two matrices S and T which have n columns and a row vector v of length n. By my construction, I know that S does not have any duplicates. What I'm looking for is a fast way to find out whether or not the row vector v appears as one of the rows of S. Currently I'm using the test

if min([sum(abs(S - repmat(f,size(S,1),1)),2);sum(abs(T - repmat(v,size(dS_new,1),1)),2)]) ~= 0 ....

When I first wrote it, I had a for loop testing each (I knew this would be slow, I was just making sure the whole thing worked first). I then changed this to defining a matrix diff by the two components above and then summing, but this was slightly slower than the above.

All the stuff I've found online says to use the function unique. However, this is very slow as it orders my matrix after. I don't need this, and it's a massively waste of time (it makes the process really slow). This is a bottleneck in my code -- taking nearly 90% of the run time. If anyone has any advice as how to speed this up, I'd be most appreciative!

I imagine there's a fairly straightforward way, but I'm not that experienced with Matlab (fairly, just not lots). I know how to use basic stuff, but not some of the more specialist functions.

Thanks!

To clarify following Sardar_Usama's comment, I want this to work for a matrix with any number of rows and a single vector. I'd forgotten to mention that the elements are all in the set {0,1,...,q-1}. I don't know whether that helps or not to make it faster!

rahnema1 · Answer 1 · 2016-10-20T17:16:41.107

1

You may want this:

ismember(v,S,'rows')

and replace arguments S and v to get indices of duplicates

ismember(S,v,'rows')

Or

for test if v is member of S:

  any(all(bsxfun(@eq,S,v,2))

this returns logical indices of all duplicates

 all(bsxfun(@eq,S,v),2)

edited Oct 20 '16 at 17:16

answered Oct 20 '16 at 16:56

rahnema1

15,264
3
15
27

Thanks for that. Unfortunately, while that is exactly what I'm looking for in the sense that it gives a `1` if there is a duplicate and a `0` if there isn't, it's far slower than my code above! :( -- Takes approximately 3 times as long! :/ – Sam OT Oct 20 '16 at 17:06
I note that `ismember` includes the function `sortrows`, like `unique does`. Maybe this is the issue. I'll try what you've just written (I have no idea what it means though!) :) – Sam OT Oct 20 '16 at 17:15
I was thinking something to do with `bsxfun` -- maybe I should have looked into the description more thoroughly. Using `bsxfun` gives me the same speed as my code in the original post. I'll try varying the parameters a bit. – Sam OT Oct 20 '16 at 17:20
@SmileySam sorry for late reply I engaged in the code. I will provide a benchmark and possible add more methods. yes `bsxfun` is faster than `ismember`. I can say `ismember` is not a good thing! – rahnema1 Oct 20 '16 at 17:28
I've just finished running some long-time tests. My code took approximately 100secs to run and the bsxfun code took 80. That's a pretty decent improvement! It seems like such a basic task that I was hoping there would be a way to reduce it to, say, 5secs, but probably not! It was called 353280 times, which is quite a few! – Sam OT Oct 20 '16 at 17:30
@SmileySam what is the dimension of your data? – rahnema1 Oct 20 '16 at 17:32
That depends on my parameters, but it's basically the number of different proper colourings obtainable in `k` but not `k-1` steps, starting from a certain fixed one, and then run over `k` until all proper colourings are found. (The graph is just the `n`-path.) – Sam OT Oct 20 '16 at 17:34
1

@SmileySam I do not know how your code works but if S is constant and comparison done many times it is better that you hash your data this topic may help: http://stackoverflow.com/a/40070562/6579744 – rahnema1 Oct 20 '16 at 17:37
So sort of. I said about `S` and `T`. `S` remains constant and `T` gets rows added to it. `T` doesn't normally get that big -- it's the number of new colourings available whereas `S` is all the previous ones. So I could maybe hash `S` and compare and not hash `T` and compare. – Sam OT Oct 20 '16 at 17:40
Tell you what, there's more information that I need to get out of my code first before I get overly concerned about the speed. For the moment I'll leave it with `bsxfun` -- a 20% gain is significant! Maybe once I've got all the info out, I'll return to this if I'm in need of some more speed. Sound ok? :) – Sam OT Oct 20 '16 at 17:42

Test for Duplicate Quickly in Matlab Array

1 Answers1