I'm discovering Halide and got some success with a pipeline doing various transformations. Most of these are based on the examples within the sources (color-transformations, various filters, hist-eq).
My next step needs to process the image in blocks. In a more general form, partially-overlapping blocks.
Examples
Input:
[ 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32]
Non-overlapping blocks:
Size: 2x4
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 5, 6, 7, 8,
13, 14, 15, 16]
[ 17, 18, 19, 20,
25, 26, 27, 28]
[ 21, 22, 23, 24,
29, 30, 31, 32]
Overlapping blocks:
Size: 2x4 with 50% overlap (both axes)
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 3, 4, 5, 6,
11, 12, 13, 14]
[ 5, 6, 7, 8,
13, 14, 15, 16]
-
[ 9, 10, 11, 12,
17, 18, 19, 20]
[11, 12, 13, 14,
19, 20, 21, 22]
...
I suspect there should be a nice way to express these, as those are also quite common in many algorithms (e.g. macroblocks).
What i checked out
I tried to gather ideas from the tutorial and example apps and found the following, which seem somewhat connected to what i want to implement:
- Halide tutorial lesson 6: Realizing Funcs over arbitrary domains
// We start by creating an image that represents that rectangle
Image<int> shifted(5, 7); // In the constructor we tell it the size
shifted.set_min(100, 50); // Then we tell it the top-left corner
- The problem i have: how to generalize this to multiple shifted domains without looping?
- Halide tutorial lesson 9: Multi-pass Funcs, update definitions, and reductions
- Here
RDom
is introduced which looks nice to create a block-view - Most examples using
RDom
seem to be sliding-window like approaches where there are no jumps
- Here
Target
So in general i'm asking how to implement a block-based view which can then be processed by other steps.
It would be nice if the approach will be general enough to realize both, overlapping & no overlapping
- Somehow generating the top-left indices first?
In my case, the image-dimension is known at compile-time which simplifies this
- But i still would like some compact form which is nice to work with from Halide's perspective (no handcoded stuff like those examples with small filter-boxes)
- The approach used might be depending on the output per block, which is a scalar in my case
Maybe someone can give me some ideas and/or some examples (which would be very helpful).
I'm sorry for not providing code, as i don't think i could produce anything helpful.
Edit: Solution
After dsharlet's answer and some tiny debugging/discussion here, the following very simplified self-containing code works (assuming an 1-channel 64x128 input like this one i created).
#include "Halide.h"
#include "Halide/tools/halide_image_io.h"
#include <iostream>
int main(int argc, char **argv) {
Halide::Buffer<uint8_t> input = Halide::Tools::load_image("TestImages/block_example.png");
// This is a simple example assuming an input of 64x128
std::cout << "dim 0: " << input.width() << std::endl;
std::cout << "dim 1: " << input.height() << std::endl;
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Halide::Var xo, yo, xi, yi, x, y;
// The distance between the start of each tile in the input.
int tile_stride_x = 32;
int tile_stride_y = 64;
int tile_size_x = 32;
int tile_size_y = 64;
Halide::Func tiled_f;
tiled_f(xi, yi, xo, yo) = input(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Halide::RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Halide::Func tile_means;
tile_means(xo, yo) = sum(Halide::cast<uint32_t>(tiled_f(tile_dom.x, tile_dom.y, xo, yo))) / (tile_size_x * tile_size_y);
Halide::Func output;
output(xo, yo) = Halide::cast<uint8_t>(tile_means(xo, yo));
Halide::Buffer<uint8_t> output_(2, 2);
output.realize(output_);
Halide::Tools::save_image(output_, "block_based_stuff.png");
}