In OpenCL we get an efficient hardware path for input arguments when we specify them as const global * restrict as in (for a piece of handwritten OpenCL code):
__kernel void oclConvolveGlobalMem(const global float* restrict input,
constant float* restrict filterWeights,
global float* restrict output)
However, as seen with HL_DEBUG_CODEGEN=1
Halide generates:
// Address spaces for kernel_conv_70_s0_y___block_id_y
#define __address_space__conv__70 __global
#define __address_space__input __global
#define __address_space__kernel __global
__kernel void kernel_conv_70_s0_y___block_id_y(
const int _conv__70_extent_0,
const int _conv__70_extent_1,
const int _conv__70_min_0,
const int _conv__70_min_1,
const int _conv__70_stride_1,
const int _input_min_0,
const int _input_min_1,
const int _input_stride_1,
const int _kernel_min_0,
const int _kernel_min_1,
const int _kernel_stride_1,
__address_space__conv__70 float *_conv__70,
__address_space__input const float *_input,
__address_space__kernel const float *_kernel,
__address_space___shared int16* __shared)
where the input argument is not declared restrict
. I expect this to sincerely limit performance. I do I get Halide to add the notion that the pointers are restricted (the buffer they use are not aliasing.)