1

I'm discovering Halide and got some success with a pipeline doing various transformations. Most of these are based on the examples within the sources (color-transformations, various filters, hist-eq).

My next step needs to process the image in blocks. In a more general form, partially-overlapping blocks.

Examples

Input:

      [  1,  2,  3,  4,  5,  6,  7,  8,
         9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24,
        25, 26, 27, 28, 29, 30, 31, 32]
Non-overlapping blocks:

Size: 2x4

      [ 1,  2,  3,  4,
        9, 10, 11, 12]

      [  5,  6,  7,  8,
        13, 14, 15, 16]

      [ 17, 18, 19, 20,
        25, 26, 27, 28]

      [ 21, 22, 23, 24,
        29, 30, 31, 32]
Overlapping blocks:

Size: 2x4 with 50% overlap (both axes)

      [ 1,  2,  3,  4,
        9, 10, 11, 12]

      [ 3,  4, 5, 6,
        11, 12, 13, 14]

      [ 5,  6, 7, 8,
       13, 14, 15, 16]

       -

      [ 9, 10, 11, 12,
       17, 18, 19, 20]

      [11, 12, 13, 14,
       19, 20, 21, 22]

       ...

I suspect there should be a nice way to express these, as those are also quite common in many algorithms (e.g. macroblocks).

What i checked out

I tried to gather ideas from the tutorial and example apps and found the following, which seem somewhat connected to what i want to implement:

Target

So in general i'm asking how to implement a block-based view which can then be processed by other steps.

  • It would be nice if the approach will be general enough to realize both, overlapping & no overlapping

    • Somehow generating the top-left indices first?
  • In my case, the image-dimension is known at compile-time which simplifies this

    • But i still would like some compact form which is nice to work with from Halide's perspective (no handcoded stuff like those examples with small filter-boxes)
  • The approach used might be depending on the output per block, which is a scalar in my case

Maybe someone can give me some ideas and/or some examples (which would be very helpful).

I'm sorry for not providing code, as i don't think i could produce anything helpful.

Edit: Solution

After dsharlet's answer and some tiny debugging/discussion here, the following very simplified self-containing code works (assuming an 1-channel 64x128 input like this one i created).

#include "Halide.h"
#include "Halide/tools/halide_image_io.h"
#include <iostream>

int main(int argc, char **argv) {
  Halide::Buffer<uint8_t> input = Halide::Tools::load_image("TestImages/block_example.png");

  // This is a simple example assuming an input of 64x128
  std::cout << "dim 0: " << input.width() << std::endl;
  std::cout << "dim 1: " << input.height() << std::endl;

  // The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
  Halide::Var xo, yo, xi, yi, x, y;

  // The distance between the start of each tile in the input.
  int tile_stride_x = 32;
  int tile_stride_y = 64;
  int tile_size_x = 32;
  int tile_size_y = 64;

  Halide::Func tiled_f;
  tiled_f(xi, yi, xo, yo) = input(xo * tile_stride_x + xi, yo * tile_stride_y + yi);

  Halide::RDom tile_dom(0, tile_size_x, 0, tile_size_y);
  Halide::Func tile_means;
  tile_means(xo, yo) = sum(Halide::cast<uint32_t>(tiled_f(tile_dom.x, tile_dom.y, xo, yo))) / (tile_size_x * tile_size_y);

  Halide::Func output;
  output(xo, yo) = Halide::cast<uint8_t>(tile_means(xo, yo));

  Halide::Buffer<uint8_t> output_(2, 2);
  output.realize(output_);

  Halide::Tools::save_image(output_, "block_based_stuff.png");
}
sascha
  • 32,238
  • 6
  • 68
  • 110

1 Answers1

2

Here's an example that breaks a Func into blocks of abitrary stride and size:

Func f = ... // The thing being blocked

// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Var xo, yo, xi, yi;
// The distance between the start of each tile in the input.
int tile_stride_x, tile_stride_y;

Func tiled_f;
tiled_f(xi, yi, xo, yo) = f(xo * tile_stride_x + xi, yo * tile_stride_y + yi);

Func tiled_output;
tiled_output(xi, yi, xo, yo) = ... // Your tiled processing here

To compute some reduction (like statistics) on each block, you can do the following:

RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Func tile_means;
tile_means(xo, yo) = sum(tiled_output(tile_dom.x, tile_dom.y, xo, yo)) / (tile_size_x * tile_size_y);

To flatten the tiles back into a result is a bit tricky. It probably depends on your method of combining the results in overlapped areas. If you want to add up the overlapping tiles, the simplest way is probably to use an RDom:

RDom tiles_dom(
    0, tile_size_x,
    0, tile_size_y,
    min_tile_xo, extent_tile_xo,
    min_tile_yo, extent_tile_yo);

Func output;
Expr output_x = tiles_dom[2] * tile_stride_x + tiles_dom[0];
Expr output_y = tiles_dom[3] * tile_stride_y + tiles_dom[1];
output(x, y) = 0;
output(output_x, output_y) += tiled_output(tiles_dom[0], tiles_dom[1], tiles_dom[2], tiles_dom[3]);

Note that in the above two blocks of code, tile_stride_x and tile_size_x are independent parameters, allowing for any tile size and overlap.

In both of your examples, tile_size_x = 4, and tile_size_y = 2. To get non-overlapping tiles, set the tile strides equal to the tile size. To get 50% overlapping tiles, set tile_stride_x = 2, and tile_stride_y = 1.

A useful schedule for an algorithm like this is:

// Compute tiles as needed by the output.
tiled_output.compute_at(output, tile_dom[2]);
// or
tiled_output.compute_at(tile_means, xo);

There are other options, like using a pure func (no update/RDom) that uses the mod operator to figure out tile inner and outer indices. However, this approach can be difficult to schedule efficiently with overlapping tiles (depending on the processing you do at each tile). I use the RDom approach when this problem comes up.

Note that with the RDom approach, you have to supply the bounds of the tile indices you want computed (min_tile_xo, extent_tile_xo, ...), which can be tricky for overlapped tiles.

dsharlet
  • 1,036
  • 1
  • 8
  • 15
  • This looks great and addresses a lot. Thanks! Upvoted. Give me some time to digest this before a possible accepted answer marking! – sascha Apr 17 '17 at 17:15
  • I'm still working and analyzing it. But if you got some time & motivation: could you add some processing as this: each tile computes it's pixel-mean (e.g. mean of 4*2 values) and the output is the matrix of these means (so the dimensions are smaller now and depend on the block-vars). At the moment i got some trouble about that output (general idea missing) and more important: i don't know how to put that mean-calc inside of tiled_output. – sascha Apr 17 '17 at 18:18
  • 1
    I added the tile_means example. You don't need any explicit scheduling with that specific algorithm to get the compute to have good locality, but you might want to compute the tiled_output at each tile, which I added to the schedule example. – dsharlet Apr 17 '17 at 19:52
  • I don't think i understood Halide at all. I'm not able to convert your answer to working code (and i'm very sure it's because of my wrong understanding as you seem to be very familiar with halide). If you got some, time, you might have a look at [this example image i created](http://i.imgur.com/Eyo0Xvc.png). I try to get the mean of those 4 blocks with [this simplified code](https://gist.github.com/sschnug/c8eeaf5cd81eb1ec04220fb36bb202c7), which compiles and runs but only outputs zeros. I used ```tiled_f``` as input to ```tile_means``` as that's my understanding. Output is always all-zero. – sascha Apr 18 '17 at 00:16