The problem is similar to applying a sliding 2D window max filter over an M x N image. However, in contrast to the normal case, the sliding window does not simply slide over 1 pixel but instead by a k-pixel hop. So for example, if k = 2, the sliding window would be applied horizontally to the pixel at (0,0), (2,0), (4,0), ... and similarly vertically.
For the 1-hop case, there is already an existing implementation in CUDA NPP (performance primitives) library. More specifically, these functions are called nppiFilterMax*
. However, I haven't been able to find a more generalized version for k-hop sliding. As a work-around, I could simply use the NPP function to perform the 1-hop filter max and simply pick result at the according positions, although this seems wasteful and inefficient. Is there any existing implementation or whitepaper for this problem?