This is the convolution function wrote with Halide:
output(x, y, z, n) = sum(Input(x * stride + r.x, y * stride + r.y, r.z, n) * Kernel(r.x, r.y, r.z, z));
I kinda expecting I can schedule sum, like:
sum.compute_at(output, x_inner)
so the sum can be more than only one element.
How can I do this with Halide?