13

In kernel function, I want two vectors of shared memory, both with size length (actually sizeof(float)*size).

Since it is not possible to allocate memory directly in the kernel function if a variable is needed, I had to allocate it dynamically, like:

    myKernel<<<numBlocks, numThreads, 2*sizeof(float)*size>>> (...);  

and, inside the kernel:

extern __shared__ float row[];
extern __shared__ float results[];    

But, this doesn't work.

Instead of this, I made only one vector extern __shared__ float rowresults[] containing all the data, using the 2*size memory allocated. So row calls are still the same, and results calls are like rowresults[size+previousIndex]. And this does work.

It is not a big problem because I get my expected results anyway, but is there any way to split my dynamically allocated shared memory into two (or more) different variables? Just for beauty.

paleonix
  • 2,293
  • 1
  • 13
  • 29
BobCormorano
  • 650
  • 1
  • 7
  • 14

1 Answers1

13

The C Programming guide section on __shared__ includes examples where you allocate multiple arrays from dynamically allocated shared memory:

extern __shared__ float array[];
__device__ void func()      // __device__ or __global__ function
{
    short* array0 = (short*)array; 
    float* array1 = (float*)&array0[128];
    int*   array2 =   (int*)&array1[64];
}

Since you're just getting a pointer to an element and making that a new array, I believe you could adapt that to use dynamic offsets instead of the static offsets they have in the example. They also note that the alignment has to be the same, which shouldn't be an issue in your case.

lmortenson
  • 1,610
  • 11
  • 11
  • A more beautiful way to do it, that is what I was looking for. Thanks. – BobCormorano Mar 15 '13 at 15:20
  • @Imortenson Does this method support the benefits of dynamic access alignment of elements in memory? Will the `float4` be accessed at `32byte` alignment if you have allocated the memory as `sizeof(float)` or the access will follow the `float` allignment?Thanks. – BugShotGG Sep 12 '14 at 11:09
  • In your example, you know the sizes of your arrays; so why not define a `struct` instead, and reinterpret the shared memory as that struct? – einpoklum Feb 17 '17 at 23:23
  • 1
    @einpoklum The example I copied was from the C guide which were static offsets, but the question asker didn't know the size of their arrays ahead of time, so it would be less useful to convert it to a struct. – lmortenson Feb 23 '17 at 21:21
  • `array` should be outside of `func` or it could be inside the function? – auraham Apr 27 '17 at 17:53