I have a simple device side kernel that uses enqueue_event. Everything works as intended, until I define a clk_event_t, which I want to use to enqueue several kernels, that wait for each other. The problem: when defining the clk_event_t, the "buffer" array automatically takes the value of 0.0 in each coordinate. If I remove the line with the clk_event_t definition, the code works as intended, ScalarMultHelp does exactly what I want it to do. It seems, that somehow the buffer pointer is altered, as the lines setting buffer[0] and buffer[1] have no effect AT ALL. My code (simplyfied):
kernel void ScalarMultiply(global double* A, global double* B, global double* buffer,
global double* buffer_small, int n) {
const size_t gid = get_global_id(0);
buffer[gid] = A[gid] * B[gid];
if (gid == 0) {
const int subthread_size = 256;
ndrange_t ndr = ndrange_1D(n, subthread_size);
clk_event_t marker_event01; // ERROR: sets buffer to {0,0,0,...} !!!!!!!!
if (enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_NO_WAIT,
ndr, 0, NULL, NULL, // here, instead of NULL, I will use marker_event01 as returning event
^(local void* target){
scalarMultHelp(buffer, buffer_small, (local double*)target);
}, subthread_size*sizeof(double) ) != CLK_SUCCESS) {
buffer[0] = -1;// not doing anything
release_event(&marker_event01);
return;
}
buffer[0] = 3.0; // not doing anything
release_event(&marker_event01);
buffer[1] = 4.0;// not doing anything
}
The kernel is running on a NVIDIA Geforce GTX Ti 1050, the kernel program is built with -cl-std=CL2.0 flag in order to enable opencl 2.0 features with the NVIDIA driver.
I've tried to find a solution for quite a while now, but I could not find anything on the web.