I'm trying to figure out how to implement max pooling on Arrayfire. My current best approach involves iterating over each convolved output and apply a function which applies four kernels, [1 0 0 0], [0 1 0 0], [0 0 1 0], [0 0 0 1], and produces four outputs, which I can then compare for the maximum value at each pixel.
My issue with that is it seems terribly slow and incorrect to be looping like that in a tensor library, but I haven't been able to come up with a better solution