7

Does anybody know how to check whether the code is running on the GPU or CPU using Cuda?

__device__ __host__  double count_something(double variable) {
  if (RUN_ON_GPU) {
    use_cuda_variables();
  } else {
    use_cpu_variables();
  }
}
dda
  • 6,030
  • 2
  • 25
  • 34
T_T
  • 553
  • 1
  • 5
  • 19

2 Answers2

14

There is no way to runtime check which architecture a piece of code is running on, but there is also no need to know, because it can be determined at compile time and handled accordingly. nvcc defines several preprocessor symbols which can be used to parse the compilation trajectory while code is being compiled. The key symbol is __CUDA_ARCH__ which is never defined when compiling host code and always defined when compiling device code.

So it is possible to write a function like this:

__device__ __host__ float function(float x)
{
#ifdef __CUDA_ARCH__
    return 10.0f * __sinf(x);
#else
    return 10.0f * sin(x);
#endif
}

which will emit different code depending on whether it is compiled for the GPU or host. You can read a more thorough discussion about compilation steering in this Stack Overflow question or in the C language extensions section of the CUDA programming guide.

Community
  • 1
  • 1
talonmies
  • 70,661
  • 34
  • 192
  • 269
  • This is not fully correct. In some cases this code doesn't work - I've spent a lot of time in debugging before I found solution. – avtomaton Dec 03 '14 at 20:43
  • @avtomaton: What isn't correct? How does debugging fit into what is effectively just C++ preprocessor code? – talonmies Dec 03 '14 at 20:46
  • 2
    This is not fully correct. In some cases this code doesn't work - I've spent a lot of time in debugging before I found solution. `__CUDA_ARCH__` can be defined even in host code, but it is defined to 0 in such case. Thus proper checking is something like this: `__device__ __host__ float function(float x) { #if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ > 0)) return 10.0f * __sinf(x); #else // host code here #endif }` – avtomaton Dec 03 '14 at 20:56
  • `#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ > 0))` is definitely the correct answer. – Pedro Boechat Dec 18 '14 at 14:49
5

I can't add proper code markdown in comments - decided to add full answer. Using only __CUDA_ARCH__ define checking is not fully correct. In some cases this code doesn't work - I've spent a lot of time in debugging before I found solution (CUDA documentation haven't any mention about it now).
__CUDA_ARCH__ can be defined even in host code, but it is defined to 0 in such case. Thus proper checking is something like this:

__device__ __host__ float function(float x)
{
#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ > 0))
    // device code here
    return 10.0f * __sinf(x);
#else
    // host code here
    return 10.0f * sin(x);
#endif
}
avtomaton
  • 4,725
  • 1
  • 38
  • 42