3

I've attended a few Halide panels over the years at Siggraph and I finally decided to do some testing to determine if it would be useful to transcode my existing software. So far the results have been impressive.

I was writing a Gaussian Blur based on the code presented at Siggraph 2015 and ran into some weird behavior that I can't make sense of. I'm not sure if it is my own misunderstanding or some kind of bug/"feature".

See code below, note the empty loop. The gkernel and normalize are functions I've written to produce the Gaussian coefficients. When I compile and run the code with the loop commented out the output image is black (all zeros). When I leave the empty loop in the function executes much faster and the output image is correctly blurred.

Am I missing something fundamental or is this some sort of bug? I'm using MSVS Professional 2013 on Windows 7.

Function Code:

Func HalideGBlur(Func f){
    float k[3];
    gkernel(k);
    normalize(k);

    for (int i = 0; i < 1; i++){
        ;
    }

    Func ypass;
    ypass(X, Y, C) = ( k[1] * f(X, Y, C) +
                       k[0] * (f(X, Y - 1, C) + f(X, Y + 1, C)) );
    Func xpass;
    xpass(X, Y, C) = ( k[1] * ypass(X, Y, C) +
                       k[0] * (ypass(X -1, Y, C) + ypass(X + 1, Y, C)) );

    //scheduling for x and y passes
    xpass.compute_root().vectorize(X, 8).parallel(Y);
    ypass.compute_at(xpass, Y).vectorize(X, 8);
    return xpass;
}

Relevant Execution code:

Func g = HalideGBlur(bounded_image);

htime = ocvtime = FLT_MAX;
cout << "\n****Testing Gaussian Blur****\n";
//Run Halide tests
for (int x = 0; x < 10; x++){
    start_time = omp_get_wtime();
    g.realize(output);
    end = omp_get_wtime() - start_time;
    if (end < htime){ htime = end; }
}
cout << "halide best: " << htime << "\n";

Results without the meaningless loop:

****Testing Gaussian Blur****
halide best: 0.0246554
ocv best: 0.0318704
Halide is 1.2926 times as fast as OpenCV.

Results with the meaningless loop:

****Testing Gaussian Blur****
halide best: 0.00749808
ocv best: 0.0317644
Halide is 4.2363 times as fast as OpenCV.
BHawk
  • 2,382
  • 1
  • 16
  • 24

1 Answers1

1

That's a puzzler. Maybe you have a memory-stomping bug and that loop is affecting stack frame layout. Is there a valgrind equivalent on Windows you can use to check for this?

Andrew Adams
  • 1,396
  • 7
  • 3
  • I ran the VS's performance profiler and didn't find any obvious memory address issues. I did try some other experiments that suggest the possibility of memory-stomping. I removed the loop and tried declaring an integerI initialized the array with 4 elements instead of 3. This fixed the problem and the output image was correct. – BHawk Aug 26 '15 at 17:05
  • EDIT - I removed the loop and tried declaring an integer: output image was still black. I initialized the array with 4 elements instead of 3: output image was correct (suggesting memory stomp). It seems strange that a memory stomp would affect the result image, especially since I am not referencing the 3rd [2] element in the array in the calculation at all. I'll keep experimenting and see if I can get any more useful data on the issue. – BHawk Aug 26 '15 at 17:15
  • @BHawk, Were you able to implement a fast Gaussian Blur with Halide? Have you compared to Intel IPP? – Royi Mar 14 '22 at 12:39