1

If A and B are tables (or datasets) having the same the same columns (and in the same order), then an expression like ismember(A(:, somecols), B(:, somecols)) will produce a boolean array suitable for indexing A with, as in

A(ismember(A(:, somecols), B(:, somecols)), :)

The line above evaluates to a table (or dataset, depending on the class of A) consisting of those rows of A that match some row of B at the columns specified in somecols.

But now suppose that B has exactly one row. More realistically, suppose that the criterion for selecting rows from A is simply to match this one single row of B, say the first one.

One could do this:

A(ismember(A(:, somecols), B(1, somecols)), :)

The main quibble I have with this is that it is not "semantically clear", because ismember is being used, in effect, to test for equality.

It would be semantically clearer if one could write

A(isequal(A(:, somecols), B(1, somecols)), :)

but this does line not produce the desired results. (Specifically, it returns no matches even when A(:, ...) contains rows matching B(1, ...).)

My question is, what is the predicate that will correctly produce the logical vector corresponding to the question "does this row of A match this reference row at somecols"?

chappjc
  • 30,359
  • 6
  • 75
  • 132
kjo
  • 33,683
  • 52
  • 148
  • 265
  • 1
    As @MrAzzaman's answer below suggests, you'd want `eq` (`==`), not `isequal` if you want to index in to the rows of `A`. These are not equivalent. Unfortunately, `eq` has not been written for the table type (probably for the same reasons that it isn't supported for `cell`, `struct`, etc.). If these seem unwieldy, you might consider making a set of utility functions for yourself that simplify the code you need to write. – horchler Feb 04 '14 at 01:43

2 Answers2

3

For the table data type you can also use innerjoin, but ismember is fairly clear in this case. Consider the tables At and Bt, where Bt has two common rows and one unique row:

>> A = randi(7,4,5);
>> commonRows = [1 3];
>> B = [A(commonRows,:); randi(2,1,5)+7];
>> At = array2table(A,'VariableNames',sprintfc('C%d',1:size(A,2)))
At = 
    C1    C2    C3    C4    C5
    __    __    __    __    __
    4     1     5     7     7 
    2     6     5     1     4 
    4     4     6     7     4 
    2     7     7     5     6 
>> Bt = array2table(B,'VariableNames',sprintfc('C%d',1:size(A,2)))
Bt = 
    C1    C2    C3    C4    C5
    __    __    __    __    __
    4     1     5     7     7 
    4     4     6     7     4 
    8     8     9     9     9 

The second output argument of innerjoin, IA, gives you the indexes of rows in A that are also in B. As in your example, consider a subset of the columns, specified by somecols:

>> somecols = [2 5]
somecols =
     2     5
>> [Ct,IA] = innerjoin(At(:,somecols), Bt(1,somecols))
Ct = 
    C2    C5
    __    __
    1     7 
IA =
     1
>> [Ct,IA] = innerjoin(At(:,somecols), Bt(2,somecols))
Ct = 
    C2    C5
    __    __
    4     4 
IA =
     3
>> [Ct,IA] = innerjoin(At(:,somecols), Bt(3,somecols))
Ct = 
   empty 0-by-2 table
IA =
     []

If IA is empty (or not) is a suitable test:

>> [~,IA] = innerjoin(At, Bt(3,:));
>> isempty(IA)
ans =
     1
>> [~,IA] = innerjoin(At, Bt(2,:));
>> isempty(IA)
ans =
     0

Or just test the first output, the common table rows:

>> isempty(innerjoin(At, Bt(3,:)))
ans =
     1
>> isempty(innerjoin(At, Bt(1,:)))
ans =
     0
chappjc
  • 30,359
  • 6
  • 75
  • 132
1

I agree that with the ismember option it may not be immediately clear what you are intending (though there is nothing wrong with it exactly). Another way you could do which I guess might be more semantically clear (though potentially less efficient) is to use bsxfun like so:

all(bsxfun(@eq,A(:,somecols),B(1,somecols)),2);

If you were to expand this into what is essentially happening under the hood, it would be something like:

a = A(:,somecols);
b = repmat(B(1,somecols),size(A,1),1);
abeq = all(a == b,2);
A(abeq,:);

Basically you're replicating the one B row so that it is the same size as A(:,somecols) and then comparing each value in each array. Finally, you're checking which rows have a whole row of true (by using all), which indicates it matches the single row of B.

EDIT: Sorry, apparently I misunderstood the question - if you're using the table datatype (which I didn't actually know existed until a few minutes ago - thanks horchler), then this approach probably won't work.

EDIT2: Notlikethat pointed out the existence of the function rowfun, which acts on each row in a table. I can't test this (my version of MATLAB isn't new enough) but I assume that something like this would do what you are wanting:

A(rowfun(@(x) isequal(B(1,somecols),x),A(:,somecols)),:);
MrAzzaman
  • 4,734
  • 12
  • 25
  • 1
    FYI, `bsxfun` and `eq` aren't defined for the `table` datatype unfortunately so neither of your code snippets will work without conversion to another format (which isn't always possible). – horchler Feb 04 '14 at 01:27
  • Is `table` a new MATLAB command? I'm still running R2012a, and it doesn't appear to exist for me. – MrAzzaman Feb 04 '14 at 01:29
  • I think that it's [new as of R2013b](http://www.mathworks.com/help/matlab/release-notes.html#btz8s39). I thought that there was something previous but I can't find it. There is [`dataset`](http://www.mathworks.com/help/stats/dataset.html) in the Statistics toolbox which is [supposed to be similar](http://www.mathworks.com/matlabcentral/answers/86779-whats-the-difference-between-a-table-new-in-r2013b-and-a-dataset-stats-toolbox). – horchler Feb 04 '14 at 01:32
  • 3
    Tables also appear to have brought [`rowfun()`](http://www.mathworks.co.uk/help/matlab/ref/rowfun.html) to the party, for an ever-so-slightly-semantically-cleaner version of this approach – Notlikethat Feb 04 '14 at 01:39
  • 1
    @Notlikethat: Nice find! There's also [`varfun`](http://www.mathworks.com/help/matlab/ref/varfun.html). All `table` method/properties are listed [here](http://www.mathworks.com/help/matlab/tables.html). – horchler Feb 04 '14 at 01:46