6

How can I get CUDA Compute capability (version) in compile time by #define? For example, if I use __ballot and compile with

nvcc -c -gencode arch=compute_20,code=sm_20  \
        -gencode arch=compute_13,code=sm_13
        source.cu

can I get version of compute capability in my code by #define for choose the branch of code with __ballot and without?

Alex
  • 12,578
  • 15
  • 99
  • 195

1 Answers1

16

Yes. First, it's best to understand what happens when you use -gencode. NVCC will compile your input device code multiple times, once for each device target architecture. So in your example, NVCC will run compilation stage 1 once for compute_20 and once for compute_13.

When nvcc compiles a .cu file, it defines two preprocessor macros, __CUDACC__ and __CUDA_ARCH__. __CUDACC__ does not have a value, it is simply defined if cudacc is the compiler, and not defined if it isn't.

__CUDA_ARCH__ is defined to an integer value representing the SM version being compiled.

  • 100 = compute_10
  • 110 = compute_11
  • 200 = compute_20

etc. To quote the NVCC documentation included with the CUDA Toolkit:

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

So, in your case where you want to use __ballot(), you can do this:

....
#if __CUDA_ARCH__ >= 200
    int b = __ballot();
    int p = popc(b & lanemask);
#else
    // do something else for earlier architectures
#endif
harrism
  • 26,505
  • 2
  • 57
  • 88
  • Thanks a lot! Its work :) And what does it mean: CUDA_VERSION? Is it equal to the version number of CUDA Toolkit? – Alex Oct 03 '12 at 11:16
  • 2
    Yes, [see here for example](http://developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/docs/online/group__CUDA__TYPES_g3c09bba9b1547aa69f1e346b82bcdb50.html). Actually, it's the major version times 1000 + minor version times 10, so 4.2 --> 4020. – harrism Oct 03 '12 at 11:25