1

Printed loops nesting for Halide::sum is not equivalent for optimal as written in tutorial.

This code provides separate loops for zero initialization and summation.

  Halide::Func f("f");
  Halide::Var x("x");
  Halide::RDom r(0, 3);

  f(x) = Halide::sum(r + x);
  f.print_loop_nest();

  f.realize(10);

output:

produce f:
  for x:
    produce sum:
      for x:
        sum(...) = ...
      for x:
        for r4:
          sum(...) = ...
    consume sum:
      f(...) = ...

Can fuse this loops or it does not impact on performance? Thanks!


Update: Fuse like this:

produce f:
  for x:
    produce sum:
      for x:
        sum(...) = ...
        for r4:
          sum(...) = ...
    consume sum:
      f(...) = ...
Dmitry Kurtaev
  • 823
  • 6
  • 14

1 Answers1

1

This is a case of print_loop_nest being confusing. That inner loop over x is of size 1, so it goes away. The compiled loop nest is the one you want. The outer loop over x is the non-trivial one, so this is what's really happening:

produce f:
  for x:
    produce sum:
      sum(...) = ...
      for r4:
        sum(...) = ...
    consume sum:
      f(...) = ...
Andrew Adams
  • 1,396
  • 7
  • 3