7

I am wondering if there is some easy way as to have different versions of a kernel for different architectures. Is their an easy way? or the only possibility is to define independent kernels in independent files and ask nvcc to compile to different architecture per file?

Daniel
  • 639
  • 1
  • 4
  • 17

2 Answers2

8

You can do that by compiler directives. Something like

__global__ void kernel(...) {

# if __CUDA_ARCH__ >= 350

    do something

# else

    do something else

# endif

}    
Vitality
  • 20,705
  • 4
  • 108
  • 146
0

With a little more C++ JackOLanterns Answer slightly modified:

template <unsigned int ARCH>
__global__ void kernel(...) 
{
    switch(ARCH)
    {
    case 35:
         do something
         break;
    case 30:
         do something else
         break;
    case 20:
         so something else
         break;
    default:
         do something for all other ARCH
         break;
    }
}

EDIT: to remove the error @sgar91 pointed out:

you can call the kernel with the porperties form your CUDA device queried via

cudaGetDeviceProperties(&props, devId);
unsigned int cc = props.major * 10 + props.minor;

switch(cc)
{
case 35:
    kernel<35><<<1, 1>>>(/* args */);
    break;
...
}
Michael Haidl
  • 5,384
  • 25
  • 43
  • 2
    But [`__CUDA_ARCH__` is not defined in host code](http://stackoverflow.com/a/8809924/1231073), so how can it be used when launching the kernel? – sgarizvi Feb 19 '14 at 07:28
  • @sgar91 thanks for the hint, i'm to long off using nvcc to compile cuda code. answer edited. – Michael Haidl Feb 19 '14 at 08:11
  • 3
    This won't work. As long as template instantiation is unconditional, each instance will be compiled for every architecture. If any of the code variants use features which are not supported by one of the target architectures, a compilation failure will result (and that is basically the entire point of the question).... – talonmies Feb 19 '14 at 09:05