0

I have a simple device side kernel that uses enqueue_event. Everything works as intended, until I define a clk_event_t, which I want to use to enqueue several kernels, that wait for each other. The problem: when defining the clk_event_t, the "buffer" array automatically takes the value of 0.0 in each coordinate. If I remove the line with the clk_event_t definition, the code works as intended, ScalarMultHelp does exactly what I want it to do. It seems, that somehow the buffer pointer is altered, as the lines setting buffer[0] and buffer[1] have no effect AT ALL. My code (simplyfied):

kernel void ScalarMultiply(global double* A, global double* B, global double* buffer,
               global double* buffer_small, int n) {
  const size_t gid = get_global_id(0);
  buffer[gid] = A[gid] * B[gid];
  if (gid == 0) {
    const int subthread_size = 256;
    ndrange_t ndr = ndrange_1D(n, subthread_size);
    clk_event_t marker_event01; // ERROR: sets buffer to {0,0,0,...} !!!!!!!!
    if (enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_NO_WAIT, 
        ndr, 0, NULL, NULL, // here, instead of NULL, I will use marker_event01 as returning event
        ^(local void* target){
             scalarMultHelp(buffer, buffer_small, (local double*)target);
               }, subthread_size*sizeof(double) ) != CLK_SUCCESS) {
      buffer[0] = -1;// not doing anything
      release_event(&marker_event01);
      return;
    }
    buffer[0] = 3.0; // not doing anything
    release_event(&marker_event01);
    buffer[1] = 4.0;// not doing anything
  }

The kernel is running on a NVIDIA Geforce GTX Ti 1050, the kernel program is built with -cl-std=CL2.0 flag in order to enable opencl 2.0 features with the NVIDIA driver.

I've tried to find a solution for quite a while now, but I could not find anything on the web.

Adelhart
  • 395
  • 2
  • 11
  • Unless something recently has changed I don't think Nvidia supports OpenCL 2.0 – doqtor Jun 25 '20 at 06:43
  • @doqtor nvidia supports opencl 2.0 partially since 2017. (There are some limitations, for example local size must divide global size, thats why the support is not "official"). My kernel compiles and runs fine on Nvidia with a lot of opencl 2.0 keywords (like enqueue_kernel, ndrange_t), so opencl 2.0 with nvidia is definitely possible. The problem might still be nvidia-related though, so I'll try to check it on another gpu brand asap. – Adelhart Jun 25 '20 at 08:17

0 Answers0