Find if a small matrix exists in a big matrix in O(n)

Question

I was asked a question in the interview:

Given matrix A and matrix B, I have to write a program to find out whether matrix B exist in matrix A.

The problem is I have to do it in O(n) time. This the only approach I have come up:

public class Matrix {
    public static void main(String[] args) {
        boolean flag = false;
        int a[][] = {
                {1, 2, 3, 4},
                {5, 6, 7, 8},
                {9, 10, 11, 12},
                {13, 14, 15, 16}};

        int b[][] = {
                {11, 12},
                {15, 16}};

        for (int i = 0; i < a.length - b.length + 1; i++) {
            for (int j = 0; j < a[0].length - b[0].length + 1; j++) {
                if (a[i][j] == b[0][0]) {
                    flag = true;
                    for (int k = 0; k < b.length; k++) {
                        for (int l = 0; l < b[0].length; l++) {
                            if (a[i + k][j + l] != b[k][l]) {
                                flag = false;
                                break;
                            }
                        }
                    }
                    if (flag) {
                        System.out.println("i= " + i + " j= " + j);
                        return;
                    }
                }
            }
        }
    }
}

I don't know how to convert it to O(n).

Is there any technique to search if small matrix B exist in big matrix A in O(n)?

Assuming by O(n) you mean linear in terms of matrix size, this can be done with hashing, you can try looking [this](https://stackoverflow.com/questions/1975386/fast-counting-of-2d-sub-matrices-withing-a-large-dense-2d-matrix) — Photon, Mar 23 '20 at 08:11
is `n` the linear size of "outer" matrix (i.e. the matrix is `n x n`) or the number of elements of the "outer matrix" (i.e. `n = m x m`)? In the first instance, I'd say it is impossible. — norok2, Mar 23 '20 at 08:25
Does this answer your question? [Fast counting of 2D sub-matrices withing a large, dense 2D matrix?](https://stackoverflow.com/questions/1975386/fast-counting-of-2d-sub-matrices-withing-a-large-dense-2d-matrix) — norok2, Mar 23 '20 at 23:01

score 0 · Answer 1 · answered Mar 23 '20 at 08:15

You can use a 2D rolling hash.

Given the (large) input matrix A[N][N], and smaller input matrix M[K][K], construct a new matrix H1[N][N-K+1] by hashing each K consecutive elements in each row like this:

 H1[i][j] = hash(A[i][j], A[i][j+1], ..., A[i][j+K-1])

If your hash function is chosen to be a rolling hash function (look it up), this runs in linear time, because you can construct H1[i][j+1] from H1[i][j] in O(1) time.

Next, hash up the columns, by constructing a new matrix H2[N-K+1][N-K+1]:

 H2[i][j] = hash(H1[i][j], H1[i+1][j], ..., A[i+K-1][j])

Apply the same procedure to your smaller matrix (which produces a matrix with a single element).

Now, compare the single hash value from the smaller matrix with each element of H2, and if they are equal, you almost certainly have a match (you can check element-wise).

uhm, but does this consider the cost of building the hash? It looks to me H1 is ~O(N²) in memory, so costructing it seems to require ~O(N²) (not considering k) — norok2, Mar 23 '20 at 08:45
@norok2 there is an ambiguity: linear time in size of matrix means O(N^2) time, since the matrix is of size N^2. Obviously no algorithm to solve this can be O(N), as you state in your answer. — Paul Hankin, Mar 23 '20 at 10:49

norok2 · Answer 2 · 2020-03-24T01:36:26.513

(EDITED)

Assume you have a matrix A of size n x m and a matrix B of sizes k x l, the problem of finding occurrences of B in A has a simple naive time complexity of O(n m k l) with O(1) memory requirement.

In general, you can easily prove that you cannot be better than O(n m), by considering the case k = l = 1 which requires checking all elements of the containing matrix, so O(n m). This is the same reason why search-string algorithms cannot be (globally) super-linear.

I assume that your requirement of being O(N) translates more properly in the requirement of being O(n m). If this was possible, you could assume that a similar algorithm could be adapted to the string-search problem with O(n) complexity (n being the size of input), independent of the size of the pattern k. No such algorithm has been found (and probably even exists). For this reason, I would tend to believe that what you are looking for is, if possible, currently beyond human knowledge.

Instead, based on the string-search algorithms literature, what you could aim for is to get to O(n m + k l) complexity.

A possible approach would be to adapt one of the aforementione string-search algorithms to this problem, and, hence, you should be able to get similar time complexities and memory requirements.

For example, both your algorithm and @PaulHankin answer are a description of an adaptation of Rabin-Karp algorithm to the 2D case. While your version uses a really poor hash (the first element of each matrix), if you were to compute a more advanced/appropriate hash (as suggested, but not provided -- at least at the time of writing in @PaulHankin answer), like a rolling hash, then you would be able to skip the two innermost loops most of the time, while the rolling hash would make sure that you are not adding extra input-size-dependent complexity to the algorithm, which will result in O(n m + k l) time complexity (the O(k l) comes from computing the hash on B) and O(1) memory requirement.

Adaptation of other string-search algorithms (like the Knuth-Morris-Pratt (KMP) algorithm or the Two-way string-search (2WSS) algorithm) may require some "linearization" of the algorithm (not just the problem formulation), which will mean using the modulo arithmetic to find out the correct offsets under all circumstances, which may be tedious, but I do not see a reason why this would not be possible or would make you loose the expected complexities.

Another option would be to adapt string-search algorithms to work interleaved in each dimension. But again this may prove as hard as working with some "linearized" problem.

The final message here is that it is definitely possible to go beyond O(n m k l) and eventually O(n m + k l), but it is not easy.

Can you explain how to linearize the problem? Are you sure it's possible? — Paul Hankin, Mar 23 '20 at 10:50
You've shown how you intend to linearize the input to the problem, but a sub-matrix is not contiguous characters in this linearized form, so how can string-search find a particular sub-matrix? — Paul Hankin, Mar 23 '20 at 12:51
@PaulHankin ...which means some of the `+1` in the algorithm must be `+offset` to be determined. That is what I meant with the last line. — norok2, Mar 23 '20 at 13:04
How do you adapt any string search algorithm to do this though? You need to choose +1 or +offset depending on how much initial segment of a string you've already matched, but I believe that would mess up the incrementality of most algorithms. I think you're skating over details that are harder than you believe. Perhaps you can sketch what a modified KMP would look like if not. — Paul Hankin, Mar 23 '20 at 13:35
@PaulHankin I am not saying that the adaptation is easy, and properly keeping track of all the offsets would be quite tedious. Yet I do not see a reason why this would not be possible. — norok2, Mar 23 '20 at 21:48

Find if a small matrix exists in a big matrix in O(n)

2 Answers2