0

Suppose I have a matrix A:

A = [1 2 3 6 7 8];

I would like to split this matrix into sub-matrices based on how relatively close the numbers are. For example, the above matrix must be split into:

B = [1 2 3];
C = [6 7 8];

I understand that I need to define some sort of criteria for this grouping so I thought I'd take the absolute difference of the number and its next one, and define a limit upto which a number is allowed to be in a group. But the problem is that I cannot fix a static limit on the difference since the matrices and sub-matrices will be changing.

Another example:

A = [5 11 6 4 4 3 12 30 33 32 12];

So, this must be split into:

B = [5 6 4 4 3];
C = [11 12 12];
D = [30 33 32];

Here, the matrix is split into three parts based on how close the values are. So the criteria for this matrix is different from the previous one though what I want out of each matrix is the same, to separate it based on the closeness of its numbers. Is there any way I can specify a general set of conditions to make the criteria dynamic rather than static?

HansHirse
  • 18,010
  • 10
  • 38
  • 67
Matte
  • 339
  • 2
  • 19

1 Answers1

1

I'm afraid, my answer comes too late for you, but maybe future readers with a similar problem can profit from it.

In general, your problem calls for cluster analysis. Nevertheless, maybe there's a simpler solution to your actual problem. Here's my approach:

  1. First, sort the input A.
  2. To find a criterion to distinguish between "intraclass" and "interclass" elements, I calculate the differences between adjacent elements of A, using diff.
  3. Then, I calculate the median over all these differences.
  4. Finally, I find the indices for all differences, which are greater or equal than three times the median, with a minimum difference of 1. (Depending on the actual data, this might be modified, e.g. using mean instead.) These are the indices, where you will have to "split" the (sorted) input.
  5. At last, I set up two vectors with the starting and end indices for each "sub-matrix", to use this approach using arrayfun to get a cell array with all desired "sub-matrices".

Now, here comes the code:

% Sort input, and calculate differences between adjacent elements
AA = sort(A);
d = diff(AA);

% Calculate median over all differences
m = median(d);

% Find indices with "significantly higher difference", 
% e.g. greater or equal than three times the median
% (minimum difference should be 1)
idx = find(d >= max(1, 3 * m));

% Set up proper start and end indices
start_idx = [1 idx+1];
end_idx = [idx numel(A)];

% Generate cell array with desired vectors
out = arrayfun(@(x, y) AA(x:y), start_idx, end_idx, 'UniformOutput', false)

Due to the unknown number of possible vectors, I can't think of way to "unpack" these to individual variables.

Some tests:

  A =
     1   2   3   6   7   8

  out =
  {
    [1,1] =
       1   2   3

    [1,2] =
       6   7   8
  }


  A =
      5   11    6    4    4    3   12   30   33   32   12

  out =
  {
    [1,1] =
       3   4   4   5   6

    [1,2] =
       11   12   12

    [1,3] =
       30   32   33
  }


  A =
     1   1   1   1   1   1   1   2   2   2   2   2   2   3   3   3   3   3   3   3

  out =
  {
    [1,1] =
       1   1   1   1   1   1   1

    [1,2] =
       2   2   2   2   2   2

    [1,3] =
       3   3   3   3   3   3   3
  }

Hope that helps!

HansHirse
  • 18,010
  • 10
  • 38
  • 67
  • I've completed working on the problem but this helped me understand cluster analysis better, thank you for answering :) – Matte Nov 04 '19 at 07:36