5

So I have the following code:

File: Cuda.cu

template <typename T>
__global__ void xpy( int n, T *x, T *y, T *r )
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) r[i] = x[i] + y[i];
}

mtx_mtx_add( float *a1, float *a2, float *r, const int &numElements )
{
// snip
xpy<<<numBlocks, blockSize>>>(numElements, a1, a2, r); 
}
mtx_mtx_add( int *a1, int *a2, int *r, const int &numElements ) {:::}
mtx_mtx_add( long long *a1, long long *a2, long long *r, const int &numElements ) {:::}

File: Calling Code

extern "C" bool mtx_mtx_add( float *a1, float *a2, float *r, int &numElements );
extern "C" bool mtx_mtx_add( float *a1, float *a2, float *r, int &numElements );
extern "C" bool mtx_mtx_add( float *a1, float *a2, float *r, int &numElements );

int main()
{
... ...
mtx_mtx_add(...);
}

Now what I want is for the mtx_mtx_add function to be templated. Is this possible and if so how?

aCuria
  • 6,935
  • 14
  • 53
  • 89

1 Answers1

9

Programming in CUDA is basically C++. You can use all the features of the C++ language as you would use in a standard C++ program.

You can create the function template as follows:

template<typename T>
bool mtx_mtx_add(T *a1, T *a2, T *r, const int &numElements)
{
   xpy<T><<<numBlocks, blockSize>>>(numElements, a1, a2, r);
}

Then you can specialize the function template for different data-types as:

template bool mtx_mtx_add<float>(float* a1, float* a2, float* r, const int& numElements);
template bool mtx_mtx_add<int>(int* a1, int* a2, int* r, const int& numElements);
sgarizvi
  • 16,623
  • 9
  • 64
  • 98
  • What files do each of these go in? I cant get it to work. If I put the templated mtx_mtx_add inside Cuda.cu, it cant be "seen" outside of Cuda.cu – aCuria Jan 30 '13 at 10:56
  • Both of these (Definition and Specialization) should be in the `cu` file. While the Declaration of the template goes in the header file. Actually separating the declaration and definition of a C++ template into source and header files is a complex task. I'm afraid I can't help much on it. – sgarizvi Jan 30 '13 at 11:02
  • 1
    This works as expected after I added the above code to the .cu file and forward-declare the function wherever I want to use it: template void mtx_mtx_add(T *x, T *y, T *r, const int & numElements); – aCuria Jan 30 '13 at 12:02
  • Unfortunately, the need to hardcode-specialize the function template in the .cu file can be an issue in some situations. For example, if my function prototype were template Matrix sum( const Matrix & lhs, const Matrix & rhs ); Then I would have to specialize it for-all M and N... not cool. – aCuria Jan 30 '13 at 12:05
  • Yeah, that's the problem with C++ templates. Either the template definition should be visible to the code using it. Or the template should be specialized for every possible input arguments. – sgarizvi Jan 30 '13 at 12:10
  • Usually its not an issue having the definition visible to the code using it, but is that possible in this case where the templates must be compiled with nvcc? Or is this impossible – aCuria Jan 30 '13 at 12:16
  • `nvcc` is just another compiler with extended capability to compile CUDA device code. Any C or C++ code is similar for `nvcc` as it is for `gcc` or Visual C++. `nvcc` can effectively compile C++ templates. – sgarizvi Jan 30 '13 at 15:47