0

I need to use an array of structs in constant memory for my kernel where the actual size of the array is not known until runtime. As answered in Correct way to use __constant__ memory on CUDA?, I realized that constant memory is allocated during compilation time so the array needs to be declared as:

__constant__ SKY_GRID_TYPE const_patch_grid_lat[5];

where the size is already defined. But because the actual size I need depends on other calculations done during runtime, it seems like I cannot use constant memory.

That answer above suggests instead to use texture memory which it says "can be set dynamically and are cached." However, the data type I need in my memory is an array of struct and according to Structure in Texture memory on CUDA, it seems like texture memory only supports CUDA built in types.

So is there maybe a workaround this? Constant memory would have been perfect for my array of struct but the size is determined dynamically so it doesn't work. Texture memory would have worked but it does not allow anything but CUDA built in types. Is there anything else I could use or some clever way to get around this?

talonmies
  • 70,661
  • 34
  • 192
  • 269
tripatheea
  • 311
  • 1
  • 3
  • 12

1 Answers1

3

Constant memory is a maximum of 64 kbytes. As far as I know there are no downsides to allocating all 64 kbytes.

Just allocate the maximum size (64kbytes/size of your struct) for your array. Use whatever you need. This also assumes that you will make uniform access across the warp, for each access.

If you need more than 64 kbytes, then of course this won't work but it calls into question the whole premise of your question.

For large constant areas, and/or for situations where you don't have uniform access, my recommendation for cc3.5 and newer GPUs is to use the read-only cache mechanism (const __restrict__ or __ldg()).

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • I can't allocate all 64 kbytes because I also need other stuff in the constant memory but I like the idea of allocating more than I need and then using just what I need sounds like a good idea and should work. How does the speed of constant memory and read-only cached global memory compare with each other? – tripatheea Jan 08 '20 at 15:18
  • both of them are on-chip GPU caches that are backed by GPU DRAM memory. Performance-wise they will be similar on a cache hit, and subject to the expectation of uniform access for constant memory. Otherwise, cache sizes, behavior, performance are generally unpublished by NVIDIA. – Robert Crovella Jan 08 '20 at 15:29