0

I'm writing a kernel which calculates a cumulative histogram using a Hillis-Steele Scan pattern. It didn't seem to be working properly - outputting numbers way too high and in the wrong order.

During debugging though, I added a simple printf() function which just prints out the global size for each work item that's run:

kernel void cumulative(global const int *hist, global int *c_hist) {
  int id = get_global_id(0);
  int size = get_global_size(0);
  
  printf("%d\n", size); //MAKES THE CUMULATIVE HISTOGRAM CORRECT
 
  for (int step = 1; step < size; step *= 2) {
    c_hist[id] = hist[id];
    if (id >= step) c_hist[id] += hist[id - step];

    barrier(CLK_GLOBAL_MEM_FENCE);
    global int* tmp = hist; hist = c_hist; c_hist = tmp; 
  }
}

Is someone able to tell me what's happening here? And is there anything that will also fix this code while not printing 1024 over and over to the console?

Using OpenCL 2 with RTX 2060 - if any other info is needed let me know and I'll find it out!

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
Allie Howe
  • 609
  • 1
  • 5
  • 20
  • 6
    If adding a print statement changes observable behavior in other parts of the code, it is typically a sign that your code contains Undefined Behavior. – 0x5453 Jan 05 '21 at 17:10
  • Sounds to me like you are using unreserved memory. Maybe check boundaries of your `hist` and `c_hist` to figure it out – gkhaos Jan 05 '21 at 17:12
  • What is printf? – A M Jan 05 '21 at 17:21
  • 1
    *Why is this printf function fixing my OpenCL kernel?* -- It isn't fixing anything. You're more than likely moving the corruption bug to another part of the program. I suggest you remove the `printf` statement, reproduce the error, and fix the error with the full knowledge of why the fixes work. – PaulMcKenzie Jan 05 '21 at 17:21
  • 1
    the kernel seems ok at first glance. assuming it works with a single work group and that size is a power of 2. You should add a minimal host code that reproduces this. – Elad Maimoni Jan 05 '21 at 17:23
  • @EladMaimoni it seems like it is more to do with the work group size - global size is 1024 and local size is 256, but I don't think it's just using a single work group, and that's causing synchronisation issues. I'm still pretty new to OpenCL but can kinda see how hat would be a problem – Allie Howe Jan 05 '21 at 17:33
  • 1
    Apparently your code contains race condition. You need some tool which will hep you to find this problem. I'm not familiar with OpenCL, but by googling I've found something like this https://streamhpc.com/blog/2013-05-13/verify-your-opencl-kernel-online/ – Marek R Jan 05 '21 at 17:34

1 Answers1

0

When your code:

kernel void cumulative(global const int *hist, global int *c_hist) {
  int id = get_global_id(0);
  int size = get_global_size(0);
 
  for (int step = 1; step < size; step *= 2) {
    c_hist[id] = hist[id];
    if (id >= step) c_hist[id] += hist[id - step];

    barrier(CLK_GLOBAL_MEM_FENCE);
    global int* tmp = hist; hist = c_hist; c_hist = tmp; 
  }
}

is feed to this site it finds following problems:

10:17: warning: initializing '__global int *' with an expression of type 'const __global int *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
global int* tmp = hist; hist = c_hist; c_hist = tmp;
^ ~~~~
1 warning generated.
00d8ad302736fae9192e5f1d4027e53e.opt.bc: warning: Assuming the arguments 'hist', 'c_hist' of 'cumulative' on line 1 of to be non-aliased; please consider adding a restrict qualifier to these arguments

6:18: error: possible null pointer access for work item (1, 1) in work group (4, 4)
c_hist[id] = hist[id];



6:16: error: possible null pointer access for work item (5, 1) in work group (4, 4)
c_hist[id] = hist[id];



error: possible read-write race on c_hist[1008]:

Write by work item (0, 0) in work group (0, 0), 6:16:
c_hist[id] = hist[id];

Read by work item (20, 1) in work group (4, 4), possible sources are:
6:18:
c_hist[id] = hist[id];

7:32:
if (id >= step) c_hist[id] += hist[id - step];

7:35:
if (id >= step) c_hist[id] += hist[id - step];



error: possible write-write race on hist[1008]:

Write by work item (0, 0) in work group (0, 0), 6:16:
c_hist[id] = hist[id];

Write by work item (20, 1) in work group (4, 4), possible sources are:
6:16:
c_hist[id] = hist[id];

7:32:
if (id >= step) c_hist[id] += hist[id - step];



error: possible write-read race on hist[128]:

Read by work item (0, 0) in work group (4, 0), 6:18:
c_hist[id] = hist[id];

Write by work item (5, 1) in work group (4, 4), possible sources are:
6:16:
c_hist[id] = hist[id];

7:32:
if (id >= step) c_hist[id] += hist[id - step];


GPUVerify kernel analyser finished with 0 verified, 5 errors

Note it reports: error: possible write-read race on hist[128]:

Marek R
  • 32,568
  • 6
  • 55
  • 140