0

I am using a Generator to create a static library for my Halide module. I am comparing the default schedule, AutoScheduler, and a GPU schedule that uses simple tiling. I have two inputs of the same size ("source" and "reference") and one output.

Everything works fine until I run an input with dimensions less than 64x64. Both the AutoScheduled and GPU scheduled versions produce this error on a 63x63 input:

Error: Output buffer output is accessed at -1, which is before the min (0) in dimension 0.

As the input size goes down, the erroneous index also decrements (e.g. 62x62 produces output is accessed at -2, 61x61 produces -3, etc.)

I'm confused because I don't get this error using the default schedule, but somehow do with the auto and GPU scheduled versions. I don't know why this issue occurs below size 64x64 either. Can anyone help, please? How do I make it work for any size input?

# include Halide.h
using namespace Halide;

class MyGenerator : public Halide::Generator<MyGenerator> {
public:

    // Input parameters
    Input <Buffer<uint8_t>> source{"src", 2};
    Input <Buffer<uint8_t>> reference{"ref", 2};
    Output <Buffer<uint8_t>> output{"output", 2};

    Input<int> radius{"radius"};

    void generate(){
        src_clamped = BoundaryConditions::constant_exterior(source, 0);
        ref_clamped = BoundaryConditions::constant_exterior(reference, 0);
        /* snipped for brevity; this part just shows I'm using padding
           and calculating output only at (x, y) */
        output(x, y) = ... ;
    }

    void schedule() {
        if (auto_schedule) {
            source.dim(0).set_estimate(0, 3000);
            source.dim(1).set_estimate(0, 4000);
            reference.dim(0).set_estimate(0, 3000);
            reference.dim(1).set_estimate(0, 4000);

            radius.set_estimate(5);

            output.set_estimate(x, 0, 3000);
            output.set_estimate(y, 0, 4000);
        } else {
            Var xo("xo"), yo("yo"), xi("xi"), yi("yi");
            if (get_target().has_gpu_feature()){
                std::cout << "Using GPU schedule\n";

                const int EXPECTED_RADIUS = 5;
                int kernel_w = EXPECTED_RADIUS * 2 + 1;
                output.gpu_tile(x, y, xo, yo, xi, yi, kernel_w, kernel_w);

            } else {
                std::cout << "Using CPU schedule\n";

            }
        }
    }
private:
    // create variables to index our location
    Var x{"x"}, y{"y"}, dx{"dx"}, dy{"dy"};

};

1 Answers1

0

In Halide, scheduling directives can impose constraints on what the size of the input can be. For example, if you compute something in 64x64 tiles, then (by default), that means the input needs to be at least 64x64 in size. There are various ways to avoid this if you want a schedule to also support very small inputs. One way is to use tell Halide to come up with a specialized schedule for small inputs using Func::specialize. Another way is to use TailStrategy::GuardWithIf as the last argument to gpu_tile. That generates masking code inside the GPU kernel so it may be slightly slower for cases larger than 64x64. In both cases it costs some extra code size to do this, which may or may not be a good trade-off depending on your use-case (which is why this isn't automatic).

Andrew Adams
  • 1,396
  • 7
  • 3
  • Thank you for your insight, Andrew. Why does the GPU kernel create 64x64 tiles? In my code, it is set to `kernel_w`, which is 11. Is it not making 11x11 tiles? – Andrew Jong Jan 29 '20 at 00:57
  • Could you post some sample code? I tried the following, which did not work: `output.gpu_tile(x, y, xo, yo, xi, yi, 16, 16); output.specialize(source.width() < 64 || source.height() < 64).gpu_tile(x, y, xo, yo, xi, yi, 4, 4); ` – Andrew Jong Jan 29 '20 at 01:16