I am wondering if there is some easy way as to have different versions of a kernel for different architectures. Is their an easy way? or the only possibility is to define independent kernels in independent files and ask nvcc to compile to different architecture per file?
Asked
Active
Viewed 165 times
2 Answers
8
You can do that by compiler directives. Something like
__global__ void kernel(...) {
# if __CUDA_ARCH__ >= 350
do something
# else
do something else
# endif
}

Vitality
- 20,705
- 4
- 108
- 146
0
With a little more C++ JackOLanterns Answer slightly modified:
template <unsigned int ARCH>
__global__ void kernel(...)
{
switch(ARCH)
{
case 35:
do something
break;
case 30:
do something else
break;
case 20:
so something else
break;
default:
do something for all other ARCH
break;
}
}
EDIT: to remove the error @sgar91 pointed out:
you can call the kernel with the porperties form your CUDA device queried via
cudaGetDeviceProperties(&props, devId);
unsigned int cc = props.major * 10 + props.minor;
switch(cc)
{
case 35:
kernel<35><<<1, 1>>>(/* args */);
break;
...
}

Michael Haidl
- 5,384
- 25
- 43
-
2But [`__CUDA_ARCH__` is not defined in host code](http://stackoverflow.com/a/8809924/1231073), so how can it be used when launching the kernel? – sgarizvi Feb 19 '14 at 07:28
-
@sgar91 thanks for the hint, i'm to long off using nvcc to compile cuda code. answer edited. – Michael Haidl Feb 19 '14 at 08:11
-
3This won't work. As long as template instantiation is unconditional, each instance will be compiled for every architecture. If any of the code variants use features which are not supported by one of the target architectures, a compilation failure will result (and that is basically the entire point of the question).... – talonmies Feb 19 '14 at 09:05