-5

lets say we have a vector A with 1000 values A = [1000 values] and a matrix B which is MxN : B = MxN

how can I select only those values from B which appear in A and make a matrix which is of M rows and equal number of colums?

Also: this is a question about classification using the mutual information. 'A' contains mutual information and 'B' contains the test dataset

EDIT: the values in A are derived from mutual information algorithm from another dataset which is of size 500x 1001. Where 500 is the number of samples and 1000 is the feature vector size. first column is the class of each sample. Matrix B consists of only the test samples with feature vectors and no class.

Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97
  • How can you guarantee that the number of values that appear both in `A` and `B` is divisible by _M_? You see, if it's not, you won't be able to make up a matrix with M rows from it. – Eitan T May 23 '13 at 11:28
  • it is not. see edit please – Abhishek Thakur May 23 '13 at 11:33
  • Still unclear. Which rows from `B` do you want to keep? Are those the ones where all values appear in `A`, or perhaps at least one value? – Eitan T May 23 '13 at 11:42

2 Answers2

2

how can I select only those values from B which appear in A ...

You can use the ismember function for that.

... and make a matrix which is of M rows and equal number of colums?

Are you sure that each row in B will have the same amount of elements that occur in A? If not, this won't work.

Marc Claesen
  • 16,778
  • 6
  • 27
  • 62
2

Use ismember to find which members of B appear in A:

ismember(B, A)

The result is a logical mask of the same dimensions as B, which you can then manipulate as you wish. To keep the rows of B that contain only elements that appear in A, do this:

Bnew = B(all(ismember(B, A), 2), :)

Handling floating-point numbers

If your data contains floating point numbers, The ismember approach may fail because floating-point comparisons are inaccurate (as pointed out by Amro). So, here's an alternative way of doing this (similar to another answer of mine), which is robust to floating-point numbers:

x = reshape(b, 1, 1, []);
idx = reshape(any(abs(bsxfun(@minus, x, a)) < eps, 2), size(b));
Bnew = B(idx);

Essentially this is a one-liner, but I've split it into two commands for clarity:

  • x is the target values to be searched, concatenated along the third dimension.
  • bsxfun subtracts each value in turn from a, and the magnitude of the result is compared to some small threshold value (e.g eps).
  • The resulting logical vector is reshaped back into a matrix idx with the dimensions of b, that serves as an index matrix to select the values that appear in A.
Community
  • 1
  • 1
Eitan T
  • 32,660
  • 14
  • 72
  • 109
  • perhaps one could use the form `ismember(A,B,'rows')`. Note that you shouldn't be doing comparison of floating-point numbers that way, use absolute difference with an appropriate threshold: `abs(x-y) – Amro May 23 '13 at 12:02
  • @Amro `A` is just a collection of values, no? Not N-length rows... in which case 'rows' is no good here. Good point about the floating-point numbers. – Eitan T May 23 '13 at 12:04
  • maybe you're right, the question is not very clear to be honest – Amro May 23 '13 at 12:09