0

In OpenCL we get an efficient hardware path for input arguments when we specify them as const global * restrict as in (for a piece of handwritten OpenCL code):

__kernel void oclConvolveGlobalMem(const global   float* restrict input,
                                         constant float* restrict filterWeights,
                                         global   float* restrict output)

However, as seen with HL_DEBUG_CODEGEN=1 Halide generates:

// Address spaces for kernel_conv_70_s0_y___block_id_y
#define __address_space__conv__70 __global
#define __address_space__input __global
#define __address_space__kernel __global
__kernel void kernel_conv_70_s0_y___block_id_y(
 const int _conv__70_extent_0,
 const int _conv__70_extent_1,
 const int _conv__70_min_0,
 const int _conv__70_min_1,
 const int _conv__70_stride_1,
 const int _input_min_0,
 const int _input_min_1,
 const int _input_stride_1,
 const int _kernel_min_0,
 const int _kernel_min_1,
 const int _kernel_stride_1,
 __address_space__conv__70 float *_conv__70,
 __address_space__input const float *_input,
 __address_space__kernel const float *_kernel,
 __address_space___shared int16* __shared)

where the input argument is not declared restrict. I expect this to sincerely limit performance. I do I get Halide to add the notion that the pointers are restricted (the buffer they use are not aliasing.)

Klamer Schutte
  • 1,063
  • 9
  • 18

1 Answers1

2

When did you last update Halide? Halide recently (sort of, October 2016) added restrict to buffer arguments: https://github.com/halide/Halide/pull/1550. The latest binary release does have this change, barely.

dsharlet
  • 1,036
  • 1
  • 8
  • 15
  • I did run before with release 2016_08_04. Now I upgraded to the latest release Halide 2016/10/22 and indeed it now has restrict -- and my code seems to run significantly faster! – Klamer Schutte Mar 13 '17 at 08:56