In kernel function, I want two vectors of shared memory, both with size
length (actually sizeof(float)*size
).
Since it is not possible to allocate memory directly in the kernel function if a variable is needed, I had to allocate it dynamically, like:
myKernel<<<numBlocks, numThreads, 2*sizeof(float)*size>>> (...);
and, inside the kernel:
extern __shared__ float row[];
extern __shared__ float results[];
But, this doesn't work.
Instead of this, I made only one vector extern __shared__ float rowresults[]
containing all the data, using the 2*size
memory allocated. So row
calls are still the same, and results
calls are like rowresults[size+previousIndex]
. And this does work.
It is not a big problem because I get my expected results anyway, but is there any way to split my dynamically allocated shared memory into two (or more) different variables? Just for beauty.