0

My kernel needs a list/array of Configuration as an input parameter. I even have a list/array of such lists/arrays, one of them is to pass to the kernel. These Configuration are prepared on the host and do not change. So this would be a perfect use for constant memory. But honestly, I do not really get how to do it.

I try to give my idea in the code draft below. Basically, I see two ways how to define/pass the lists:

  • Define them as arrays with fixed lenghts and pass them by-value to the kernel
  • Define them as pointers and just pass a pointer to the kernel (must be copied to device first, of course)

Which method should I take and how should I modify the code below to make sure, constant memory is used?

I expect each list to have typically a size less than 200-300 Bytes. If I would make all lists of the same size, I would maybe go for a size of 512 Bytes or 1 kB.

class Configuration{
  // some constants
}

// We need a list of lists Configurations, these could be implemented either as...
Configuration a[10][100]; // fixed-length array or...
Configuration ** b; // as a dynamic array to pointers of arrays

// Parameter will take an array of Configuration, either as a pointer or directly as an array
__global__ kernel(Configuration * config){

}

// According to the above example, we use the pointer-version. Could also be a call directly using a[i]
kernel<<...>>(b[i], lengthOfB[i]);
Michael
  • 7,407
  • 8
  • 41
  • 84

1 Answers1

1

If you want the data to be in __constant__ memory (which may not be a smart move, depending on how you access the data in the kernel), then the first approach (fixed length array) is the only sensible one. Also for simplicity I would flatten the two-dimensional array to a one-dimensional array, for ease of use/copying.

In addition to being read-only, __constant__ memory is intended to be accessed for efficiency such that each thread in a warp is requesting the same value. Your question didn't mention this, so you may want to refer to this question/answer for explanation/examples.

If you went with the pointer approach, only the pointer would be in constant memory (presumably), and so that is not what you want (presumably).

If you use __constant__ memory, there is no need to also pass that pointer as a kernel parameter. The data declaration has global scope.

Something like this might work:

class Configuration{
  // some constants
  int cdata;
}

__constant__ Configuration const_data[10*100];

// ***setup in host code
Configuration h_data[10*100];
// fill in h_data ...
// then copy to device
cudaMemcpyToSymbol(const_data, h_data, sizeof(h_data));
// ***

//use in kernel code
__global__ void mykernel(){

  int my_data = const_data[5].cdata;

}

Note that in total, __constant__ memory is limited to 64K bytes.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thanks for this answer. Do I need to define it globally constant? Is there no other way to use constant cache? – Michael Nov 25 '14 at 10:15
  • Yes, `__constant__` symbols must be defined in translation unit scope. – user703016 Nov 25 '14 at 10:28
  • 1
    If you have a cc3.x or higher device, you may also want to see about using the ["read-only" cache](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-memory-3-0) or `__ldg` instrinsic. Depending on your actual access patterns, it may have substantially higher throughput than `__constant__` memory. And it can be used directly with an ordinary global pointer passed as a kernel parameter. You'll want to be sure to carefully use `__restrict__` and `const` to decorate your pointers, starting with the kernel parameter itself. – Robert Crovella Nov 25 '14 at 10:44