I'm trying to write a CUDA application that is templated for floats and doubles, since I would like to be able to run on both single and double precision cards. The application uses dynamically allocated global, dynamically allocated shared, as well as constant memory and static global memory.
I've seen examples for templating dynamically allocated global and shared memory variables. And I realize that constant memory is static and so templating is generally not possible, as stated in this post: Defining templated constant variables in cuda.
I've been unable to find any workarounds to this constant memory issue, which surprises me because I'm sure I'm not the first to encounter this problem. At the moment it seems I am faced with having to write two copies of the same application, one for doubles and one for floats, if I want to use constant memory. I'm hoping this isn't the case.
As a workaround, I'm considering writing a (virtual?) base class that is templated and implements everything except for constant memory variable declaration. Then I'd like to write two classes that inherit from the base (one for floats, one for doubles) that mainly just handle constant variable declaration. My question is whether this strategy will work or if there is an obvious flaw? I just thought I'd ask before implementing the design only to find it doesn't work. If this strategy does not work, are there any other proven strategies that at least alleviate the problem? Or will I simply have to write two copies of the application, one for float and one for double?