Just pondering this question I discovered this algorithm that works in O(mn) time if the convolutional 2d array is mn.
see how. just computing the same for a 1D case where we need to find the answer of maximum in every window of size k in an array. using deque. See this for more details https://www.geeksforgeeks.org/sliding-window-maximum-maximum-of-all-subarrays-of-size-k/
Assume in 2D case k is filter size , stride is 1 0 padding and n*m matrix.
Step1 then computing max of all window of size k gives the answer of maximum window of size k in each row. after computing this over all rows.
Step2 After that in transformed matrix do the same for columns that is maximum in sliding window of size k over each column in the modified matrix. After repeating that you will get maximum of whole sub array of size k*k which start at cell i ,j as top left corner and i+k-1 , j+k-1 as bottom right corner in position i,j of the matrix.
Proof is simple.
when you have maximum of k rows then computing maximum in a window of size k over column gives the maximum over whole matrix.
Example
5 3 2 1 4
2 3 1 5 3
1 2 3 4 6
1 2 3 4 5
5 4 3 2 1
assume n=5 m=5 and k=3.
modified matrix looks like
5 3 4
3 5 3
3 4 6
3 4 5
5 4 3
Further applying step2 looks like.
5 5 6
3 5 6
5 4 6
And that's it we have max pooling layer in front of us.
Is it a good algorithm that can bring optimization to the CNN models or is there any better existing algorithm for this? Please post your opinion?