0

The problem is similar to applying a sliding 2D window max filter over an M x N image. However, in contrast to the normal case, the sliding window does not simply slide over 1 pixel but instead by a k-pixel hop. So for example, if k = 2, the sliding window would be applied horizontally to the pixel at (0,0), (2,0), (4,0), ... and similarly vertically.

For the 1-hop case, there is already an existing implementation in CUDA NPP (performance primitives) library. More specifically, these functions are called nppiFilterMax*. However, I haven't been able to find a more generalized version for k-hop sliding. As a work-around, I could simply use the NPP function to perform the 1-hop filter max and simply pick result at the according positions, although this seems wasteful and inefficient. Is there any existing implementation or whitepaper for this problem?

kangshiyin
  • 9,681
  • 1
  • 17
  • 29
user1715925
  • 607
  • 9
  • 26

1 Answers1

2

Those image convolution functions are generally designed for hop == 1. When > 1, less data are shared between different offsets, so the performance may decrease.

For hop > 1, you may need to write your own kernel to get better performance. You could see this CUDA sample "separable convolution" for more information.

http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-separable-convolution

It comes with a white paper discussing the details. Luckily max convolution is also separable. You will get some ideas on how to write your on kernel on this task it

http://docs.nvidia.com/cuda/samples/3_Imaging/convolutionSeparable/doc/convolutionSeparable.pdf

kangshiyin
  • 9,681
  • 1
  • 17
  • 29
  • Thanks for the suggestion. I actually read that paper before and was about to implement my own version. However, for now it suffices to just run the 1-hop convolution and downsample the result. – user1715925 Oct 28 '13 at 17:20