Your current algorithm is O((m - p) * (n - q) * p * q). The worst case is when p = m / 2 and q = n / 2.
The algorithm I'm going to describe will be O(m * n + p * q), which will be O(m * n) regardless of p and q.
The algorithm consists of 2 steps.
Let the input matrix A's size be m x n
and the size of the window matrix being p x q
.
First, you will create a precomputed matrix B of the same size as the input matrix. Each element of the precomputed matrix B contains the sum of all the elements in the sub-matrix, whose top-left element is at coordinate (1, 1) of the original matrix, and the bottom-right element is at the same coordinate as the element that we are computing.
B[i, j] = Sum[k = 1..i, l = 1..j]( A[k, l] ) for all 1 <= i <= m, 1 <= j <= n
This can be done in O(m * n), by using this relation to compute each element in O(1):
B[i, j] = B[i - 1, j] + Sum[k = 1..j-1]( A[i, k] ) + A[j] for all 2 <= i <= m, 1 <= j <= n
B[i - 1, j]
, which is everything of the sub-matrix we are computing except the current row, has been computed previously. You keep a prefix sum of the current row, so that you can use it to quickly compute the sum of the current row.
This is another way to compute B[i, j]
in O(1), using the property of the 2D prefix sum:
B[i, j] = B[i - 1, j] + B[i, j - 1] - B[i - 1, j - 1] + A[j] for all 1 <= i <= m, 1 <= j <= n and invalid entry = 0
Then, the second step is to compute the result matrix S whose size is p x q
. If you make some observation, S[i, j] is the sum of all elements in the matrix size (m - p + 1) * (n - q + 1), whose top-left coordinate is (i, j) and bottom-right is (i + m - p + 1, j + n - q + 1).
Using the precomputed matrix B, you can compute the sum of any sub-matrix in O(1). Apply this to compute the result matrix S:
SubMatrixSum(top-left = (x1, y1), bottom-right = (x2, y2))
= B[x2, y2] - B[x1 - 1, y2] - B[x2, y1 - 1] + B[x1 - 1, y1 - 1]
Therefore, the complexity of the second step will be O(p * q).
The final complexity is as mentioned above, O(m * n), since p <= m and q <= n.