5

I am looking for a 'good' way to find a matrix (pattern) in a larger matrix (arbitrary number of dimensions).

Example:

total = rand(3,4,5);
sub = total(2:3,1:3,3:4);

Now I want this to happen:

loc = matrixFind(total, sub)

In this case loc should become [2 1 3].

For now I am just interested in finding one single point (if it exists) and am not worried about rounding issues. It can be assumed that sub 'fits' in total.


Here is how I could do it for 3 dimensions, however it just feels like there is a better way:

total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
loc = [];
for x = 1:size(total,1)-size(sub,1)+1
    for y = 1:size(total,2)-size(sub,2)+1
        for z = 1:size(total,3)-size(sub,3)+1
            block = total(x:x+size(sub,1)-1,y:y+size(sub,2)-1,z:z+size(sub,3)-1);
            if isequal(sub,block)
                loc = [x y z]
            end
        end
    end
end

I hope to find a workable solution for an arbitrary number of dimensions.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • Not sure if it will facilitate the solution, but can `ndims(sub)` assumed to be equal to `ndims(total)`? – ojdo Sep 16 '13 at 14:56
  • 3
    Just as a note: For the 2D-only case, function [`findsubmat`](http://www.mathworks.com/matlabcentral/fileexchange/23998-findsubmat) in Matlab File Exchange has pretty good implementation ideas (and code comments). – ojdo Sep 16 '13 at 15:07
  • A bit related to the first question by @ojdo: I think you have to define more precise what output you want in case `ndim(sub) < ndims(total)`. I guess that in that case, your for-loop might not find all possible solutions. Requiring that `ndims(sub) == ndims(total)` probably simplifies things a bit. – Bas Swinckels Sep 16 '13 at 15:48
  • @BasSwinckels if I try `sub = total(2,1,3)` it seems to be ok. `ndims(sub)` is 2 and `ndims(total)` is 3. It helps that `size(sub,3)` is defined even if the dimension is singleton. Either way, there may be a bug in my code, but I hope the goal is clear: I just want it to work as long as `subs` 'fits' in `total`. – Dennis Jaheruddin Sep 16 '13 at 15:52
  • @DennisJaheruddin One problem is the fact that Matlab does not implement a true N-dimensional array, but does some weird things: it leaves off trailing singleton dimensions (`size(zeros(3,2,1)) == [3,2]`, while `size(zeros(1,2,3)) == [1,2,3]`) and scalars are always 2-D (`size(1) == [1,1]`). But my main issue is which dimension to compare with which in case `sub` has less dimensions than `total`. – Bas Swinckels Sep 16 '13 at 15:58
  • @BasSwinckels I suppose that for some purposes you may want to allow for more flexibility, but lets just assume you can only compare things in the same direction, as if the dimensions each have different meanings. More technically: for comparison follow the convention of trailing dimensions for `sub`. – Dennis Jaheruddin Sep 16 '13 at 21:59

3 Answers3

4

Here is low-performance, but (supposedly) arbitrary dimensional function. It uses find to create a list of (linear) indices of potential matching positions in total and then just checks if the appropriately sized subblock of total matches sub.

function loc = matrixFind(total, sub)
%matrixFind find position of array in another array

    % initialize result
    loc = [];

    % pre-check: do all elements of sub exist in total?
    elements_in_both = intersect(sub(:), total(:));
    if numel(elements_in_both) < numel(unique(sub))
        % if not, return nothing
        return
    end

    % select a pivot element
    % Improvement: use least common element in total for less iterations
    pivot_element = sub(1);

    % determine linear index of all occurences of pivot_elemnent in total
    starting_positions = find(total == pivot_element);

    % prepare cell arrays for variable length subscript vectors
    [subscripts, subscript_ranges] = deal(cell([1, ndims(total)]));


    for k = 1:length(starting_positions)
        % fill subscript vector for starting position
        [subscripts{:}] = ind2sub(size(total), starting_positions(k));

        % add offsets according to size of sub per dimension
        for m = 1:length(subscripts)
            subscript_ranges{m} = subscripts{m}:subscripts{m} + size(sub, m) - 1;
        end

        % is subblock of total equal to sub
        if isequal(total(subscript_ranges{:}), sub)
            loc = [loc; cell2mat(subscripts)]; %#ok<AGROW>
        end
    end
end
ojdo
  • 8,280
  • 5
  • 37
  • 60
  • 1
    `if numel(elements_in_both) < numel(sub)` fails if there are double elements in sub. maybe replace with `numel(unique(sub))` ? – Gelliant Jan 16 '18 at 12:21
  • @Gelliant: yes, good catch, I have added your suggestion to the answer. Thanks! – ojdo Aug 03 '18 at 12:23
2

This is based on doing all possible shifts of the original matrix total and comparing the upper-leftmost-etc sub-matrix of the shifted total with the sought pattern subs. Shifts are generated using strings, and are applied using circshift.

Most of the work is done vectorized. Only one level of loops is used.

The function finds all matchings, not just the first. For example:

>> total = ones(3,4,5,6);
>> sub = ones(3,3,5,6);
>> matrixFind(total, sub)
ans =

     1     1     1     1
     1     2     1     1

Here is the function:

function sol = matrixFind(total, sub)

nd = ndims(total);
sizt = size(total).';
max_sizt = max(sizt);
sizs = [ size(sub) ones(1,nd-ndims(sub)) ].'; % in case there are
% trailing singletons

if any(sizs>sizt)
    error('Incorrect dimensions')
end

allowed_shift = (sizt-sizs);
max_allowed_shift = max(allowed_shift);
if max_allowed_shift>0
    shifts = dec2base(0:(max_allowed_shift+1)^nd-1,max_allowed_shift+1).'-'0';
    filter = all(bsxfun(@le,shifts,allowed_shift));
    shifts = shifts(:,filter); % possible shifts of matrix "total", along 
    % all dimensions
else
    shifts = zeros(nd,1);
end

for dim = 1:nd
    d{dim} = 1:sizt(dim); % vectors with subindices per dimension
end
g = cell(1,nd);
[g{:}] = ndgrid(d{:}); % grid of subindices per dimension
gc = cat(nd+1,g{:}); % concatenated grid
accept = repmat(permute(sizs,[2:nd+1 1]), [sizt; 1]); % acceptable values
% of subindices in order to compare with matrix "sub"
ind_filter = find(all(gc<=accept,nd+1));

sol = [];
for shift = shifts
    total_shifted = circshift(total,-shift);
    if all(total_shifted(ind_filter)==sub(:))
        sol = [ sol; shift.'+1 ];
    end
end
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
1

For an arbitrary number of dimensions, you might try convn.

C = convn(total,reshape(sub(end:-1:1),size(sub)),'valid'); % flip dimensions of sub to be correlation
[~,indmax] = max(C(:));
% thanks to Eitan T for the next line
cc = cell(1,ndims(total)); [cc{:}] = ind2sub(size(C),indmax); subs = [cc{:}]

Thanks to Eitan T for the suggestion to use comma-separated lists for a generalized ind2sub.

Finally, you should test the result with isequal because this is not a normalized cross correlation, meaning that larger numbers in a local subregion will inflate the correlation value potentially giving false positives. If your total matrix is very inhomogeneous with regions of large values, you might need to search other maxima in C.

chappjc
  • 30,359
  • 6
  • 75
  • 132
  • 1
    This looks promising, but two things: 1. The peak is not guaranteed to correspond with the "top left corner" of the submatrix. 2. You can obtain the output parameters of `ind2sub` using comma-separated lists (see [here](http://stackoverflow.com/q/8918137/1336150) for example). – Eitan T Sep 16 '13 at 16:45
  • What if we compare `sub` with `2*total` it still returns a match. Or if we have some elements in `total` that are some order of magnitude larger than the values in `sub`, then those regions will be possible matches. You need to consider power of the signals that are being matched if you are thinking along the lines of matched filter. – Mohsen Nosratinia Sep 16 '13 at 17:15
  • @Mohsen Nosratinia, You are correct -- Hence the last paragraph of my original answer. A normalized cross-correlation would be best. – chappjc Sep 16 '13 at 17:38
  • This solution is surprisingly compact, however I don't really understand what it can and can't do. For example, it fails for the trivial case: `total = [1 2 3]; subs = 2`. Is there a way to use this solution without worrying for false positives or negatives? – Dennis Jaheruddin Sep 16 '13 at 21:41
  • When the template/`sub` matrix is small, the likelihood of a false positive becomes high. Also, as Mohsen and I have both pointed out, you are very likely to get false positives in regions of the `total` matrix that are significantly larger than the rest of the matrix. So, this is admittedly not a full solution, unless you implement N-dimensional normalized cross correlation (NCC), which does not have this drawback. This is harder to do with typical FFT solutions and _very_ slow the naive way with windowed sums -- too slow to be worth doing vs. a simple equivalence test at every location! – chappjc Sep 16 '13 at 22:10