4

I have a matrix index in Matlab with size GxN and a matrix A with size MxN.

Let me provide an example before presenting my question.

clear
N=3;
G=2;
M=5;

index=[1  2  3;
       13 14 15]; %GxN

A=[1  2  3; 
   5  6  7; 
   21 22 23; 
   1  2  3;
   13 14 15]; %MxN

I would like your help to construct a matrix Response with size GxM with Response(g,m)=1 if the row A(m,:) is equal to index(g,:) and zero otherwise.

Continuing the example above

Response= [1 0 0 1 0; 
           0 0 0 0 1]; %GxM

This code does what I want (taken from a previous question of mine - just to clarify: the current question is different)

Response=permute(any(all(bsxfun(@eq, reshape(index.', N, [], G), permute(A, [2 3 4 1])), 1), 2), [3 4 1 2]);

However, the command is extremely slow for my real matrix sizes (N=19, M=500, G=524288). I understand that I will not be able to get huge speed but anything that can improve on this is welcome.

Dev-iL
  • 23,742
  • 7
  • 57
  • 99
TEX
  • 2,249
  • 20
  • 43
  • I highly doubt this is much more improveable. You may be able to do it by instead of using 1-liners, break the code into pieces and time it – Ander Biguri Oct 03 '18 at 10:16

3 Answers3

7

MATLAB has a multitude of functions for working with sets, including setdiff, intersect, union etc. In this case, you can use the ismember function:

[~, Loc] = ismember(A,index,'rows');

Which gives:

Loc =
     1
     0
     0
     1
     2

And Response would be constructed as follows:

Response = (1:size(index,1) == Loc).';

Response =
  2×5 logical array
   1   0   0   1   0
   0   0   0   0   1
Dev-iL
  • 23,742
  • 7
  • 57
  • 99
7

Aproach 1: computing distances

If you have the Statistics Toolbox:

Response = ~(pdist2(index, A));

or:

Response = ~(pdist2(index, A, 'hamming'));

This works because pdist2 computes the distance between each pair of rows. Equal rows have distance 0. The logical negation ~ gives 1 for those pairs of rows, and 0 otherwise.

Approach 2: reducing rows to unique integer labels

This approach is faster on my machine:

[~,~,u] = unique([index; A], 'rows');
Response = bsxfun(@eq, u(1:G), u(G+1:end).');

It works by reducing rows to unique integer labels (using the third output of unique), and comparing the latter instead of the former.

For your size values this takes approximately 1 second on my computer:

clear
N = 19; M = 500; G = 524288;
index = randi(5,G,N); A = randi(5,M,N);
tic
[~,~,u] = unique([index; A], 'rows');
Response = bsxfun(@eq, u(1:G), u(G+1:end).');
toc

gives

Elapsed time is 1.081043 seconds.
Community
  • 1
  • 1
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • `findgroups` might be quicker than using the 3rd output of `unique`. Not done speed tests in the past but believe it does the same thing. – Wolfie Oct 03 '18 at 13:18
  • @Wolfie I think `findgroups` requires grouping variables to be vectors. So the matrices here would have to be split into their columns, which takes time. Also, `findgroups` internally uses the third output of `unique`, so I doubt it's faster – Luis Mendo Oct 03 '18 at 13:22
  • Ah ignore me then, didn't realise the two were intertwined! The 2nd option is quite a bit quicker than my reshaping method which surprised me. – Wolfie Oct 03 '18 at 13:24
  • @Wolfie Yes, for the OP's sizes your method takes 5 times more than my second approach on my computer. I guess when sizes are large it's beneficial to reduce a dimension as soon as possible (my second approach) even if that takes time – Luis Mendo Oct 03 '18 at 13:27
3

You could reshape the matrices so that each row instead lies along the 3rd dimension. Then we can use implicit expansion (see bsxfun for R2016b or earlier) for equality of all elements, and all to aggregate on the rows (i.e. false if not all equal for a given row).

Response = all( reshape( index, [], 1, size(index,2) ) == reshape( A, 1, [], size(A,2) ), 3 ); 

You might even be able to avoid some reshaping by using all in another dimension, but it's easier for me to visualise it this way.

Wolfie
  • 27,562
  • 7
  • 28
  • 55