I have compiled my CUDA/C++
project for all GPU compute capabilities by generating all PTX assembly codes (1.x , 2.x , 3.x , 5.0)
.
The problem is that my kernel efficiency for a given CC depends on the value X
of MACRO
(defined at compile time).
So, Is there a way to associate the value of X
to a a specific CC ?
I have tried using __CUDA_ARCH__
as follow but it says identifier MACRO is undefined
Thank you.
#ifdef __CUDA_ARCH__
#if (__CUDA_ARCH__ >= 500)
#define MACRO 10
#elseif (__CUDA_ARCH__ < 500)
#define MACRO 32
#endif
#endif
__global__ kernel ()
{
// some device code using MACRO
}
int main()
{
// some host code using MACRO
kernel <<< >>> ();
return 0;
}