1

I am writing some CUDA code to run on the device. The code will use two lookup tables of constant values. The first of these is an array of 256 unsigned ints and I declare it as :

__constant__ 
uint16_t edgeTable[256]={
   0x000,
   0x019,
   ... etc.
};

And this seems to compile fine.

The second is a fixed size array of dim3 and I tried this:

__constant__
dim3 offsets[8] = {
    {0, 0, 0}, {0, 0, 1}, {0, 1, 0},
    ... etc 
};

Which the compiler objects to. with the error message:

error: dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.

Perhaps I misunderstand dynamic initialisation but it seems to me that this is static initialisation, the compiler can work out the sizes of everything and all values are provided.

What am I missing here ?

How can I achieve what I'm trying to do ?

Thanks

I'm using CUDA7.5 toolkit on Ubuntu 14.04 with gcc 4.8.4

Dave Durbin
  • 3,562
  • 23
  • 33
  • 2
    Possibly silly question -- did you pass the flag to enable C++11? –  Jul 10 '16 at 16:15
  • 1
    Parameterized constructor initialisation (dim3 is a class) isn't supported in C++ before C++11 – talonmies Jul 10 '16 at 16:35
  • Maybe the answer to your question is that this is actually a form of dynamic initialization (see comments above) and dynamic initialization is not supported for these variable types, just as the compiler has indicated, regardless of c++11 compiler switches. As a possible workaround, you could create an array of `int` (static initialization of `__constant__ int`) and then cast it to an array of `dim3` in your kernel code, perhaps like [this](http://pastebin.com/FrMZxL8T). I think you probably could already figure out such a workaround yourself, so, I'm not suggesting this is an answer. – Robert Crovella Jul 10 '16 at 20:15
  • DOH! Thanks. I did pass -std=c++11 however completely failed to think about the fact that a struct is a class and would therefore of course be dynamically initialised. I think @Robert Crovella your answer is the only thing I can do.. – Dave Durbin Jul 11 '16 at 07:17

1 Answers1

4

The important feature of this problem is that is CUDA uses a C++ compilation model, and dim3 is treated as a class. So while:

dim3 foo = {1,1,1};

is legal in C++11, because of parameterised constructor initialisation support, this:

__constant__ dim3 foo = {1,1,1};

isn't, because that implies dynamic initialisation of a constant memory object, and the CUDA execution model doesn't permit that.

If the constant memory aspect is important to you and you want the convenience of dim3, you could do something like this:

#include <cstdio>

__constant__ int offsets[3*8];

__global__ void kernel()
{
    if (threadIdx.x < 8) {
        dim3 val = *reinterpret_cast<dim3*>(&offsets[3*threadIdx.x]);
        printf("%d (%d,%d,%d)\n", threadIdx.x, val.x, val.y, val.z);
    }
}

void setup_offsets()
{
    // This requires C++11 support
    dim3 ovals[8] = { {0,0,0}, 
                      {1,0,0}, {0,1,0}, {0,0,1},
                      {1,1,0}, {1,0,1}, {0,1,1},
                      {1,1,1} };

    cudaMemcpyToSymbol(offsets, &ovals[0], sizeof(ovals));
}

int main(void)
{
    setup_offsets();
    kernel<<<1,8>>>();
    cudaDeviceSynchronize();
    cudaDeviceReset();
    return 0;
}

which is a bit hacky, but probably the best you can hope for under the circumstances. Looking at PTX for that code, the compiler has correctly emitted ld.const.u32 to fetch each member of the dim3.

talonmies
  • 70,661
  • 34
  • 192
  • 269