I thought I knew how to write some clean cuda code. Until I tried to make a simple template class and use it in a simple kernel. I've been trouble shooting for days. Every single thread I've visited made me feel a little more stupid.
For error checking I used this
Here is my class.h:
#pragma once
template <typename T>
class MyArray
{
public:
const int size;
T *data;
__host__ MyArray(int size); //gpuErrchk(cudaMalloc(&data, size * sizeof(T)));
__device__ __host__ T GetValue(int); //return data[i]
__device__ __host__ void SetValue(T, int); //data[i] = val;
__device__ __host__ T& operator()(int); //return data[i];
~MyArray(); //gpuErrchk(cudaFree(data));
};
template class MyArray<double>;
The relevant content of class.cu is in the comments. If you think the whole thing is relevant Id be happy to add it.
Now for the main class:
__global__ void test(MyArray<double> array, double *data, int size)
{
int j = threadIdx.x;
//array.SetValue(1, j); //doesn't work
//array(j) = 1; //doesn't work
//array.data[j] = 1; //doesn't work
data[j] = 1; //This does work !
printf("Reach this code\n");
}
}
int main(int argc, char **argv)
{
MyArray x(20);
test<<<1, 20>>>(x, x.data, 20);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
}
When I say "doesn't work", I mean that the program stops there (before reaching the printf) without outputting any error. Plus I get the following error both from cudaDeviceSynchronize
and from cudaFree
:
an illegal memory access was encountered
What I can't understand is that there should be no issue with memory management since sending the array directly to the kernel works fine. So why doesn't it work when I send a class and try to access the classes data? And why do I receive no warning or error message when clearly my code bumped into some error?
Here is the output of nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85