I can not figure it out myself, what is the best way to ensure the memory used in my kernel is constant. There is a similar question at http://stackoverflow...r-pleasant-way. I am working with GTX580 and compiling only for 2.0 capability. My kernel looks like
__global__ Foo(const int *src, float *result) {...}
I execute the following code on host:
cudaMalloc(src, size);
cudaMemcpy(src, hostSrc, size, cudaMemcpyHostToDevice);
Foo<<<...>>>(src, result);
the alternative way is to add
__constant__ src[size];
to .cu file, remove src pointer from the kernel and execute
cudaMemcpyToSymbol("src", hostSrc, size, 0, cudaMemcpyHostToDevice);
Foo<<<...>>>(result);
Are these two ways equivalent or the first one does not guarantee the usage of constant memory instead of global memory? size changes dynamically so the second way is not handy in my case.