I have a linear int
array arr
, which is on CUDA global memory. I want to set sub-arrays of arr
to defined values. The sub-array start indexes are given by the starts
array, while the length of each sub-array is given in counts
array.
What I want to do is to set the value of sub-array i
starting from starts[i]
and continuing upto counts[i]
to the value starts[i]
. That is, the operation is:
arr[starts[i]: starts[i]+counts[i]] = starts[i]
I thought of using memset()
in the kernel for setting the values. However, it is not getting correctly written ( the array elements are being assigned some random values). The code I am using is:
#include <stdlib.h>
__global__ void kern(int* starts,int* counts, int* arr,int* numels)
{
unsigned int idx = threadIdx.x + blockIdx.x*blockDim.x;
if (idx>=numels[0])
return;
const int val = starts[idx];
memset(&arr[val], val, sizeof(arr[0])*counts[idx]) ;
__syncthreads();
}
Please note that numels[0]
contains the number of elements in starts
array.
I have checked the code with cuda-memcheck()
but didn't get any errors. I am using PyCUDA
, if it's relevant. I am probably misunderstanding the usage of memset here, as I am learning CUDA.
Can you please suggest a way to correct this? Or other efficient way of doint this operation.
P.S: I know that thrust::fill()
can probably do this well, but since I am learning CUDA, I would like to know how to do this without using external libraries.