0

Is there somehow a possibility to call different kernels depending on whether for example nvcc -arch=sm_11 or nvcc -arch=sm_20 has been used to compile the code? To be a bit more explicit:

if (FANCY_FLAG == CU_TARGET_COMPUTE_11)
    // Do some conversions here..
    krnl1<<<GRID_DIM1, BLOCK_DIM1>>>(converted_value1);
else if (FANCY_FLAG == CU_TARGET_COMPUTE_20)
    krnl2<<<GRID_DIM2, BLOCK_DIM2>>>(value1);

As you can see I found the CUjit_target_enum in cuda.h but I wasn't able to find out whether the nvcc defines any flags which would be equal to one of the enums values.

My intention for this is that I don't know whether my device support double precision floats or not. That would mean I have to convert my data from double to float and hence, run a different kernel (Yes, I'd prefer to run the kernel with double precision over single precision wherever possible).

I'd also appreciate a completely different approach as long as it does the trick.

pdresselhaus
  • 679
  • 14
  • 32
  • 1
    Take a look at [this question](http://stackoverflow.com/questions/8796369/cuda-and-nvcc-using-the-preprocessor-to-choose-between-float-or-double), there is a solution for your problem there. It's not run-time checking, but it does the trick. – aland Aug 02 '12 at 15:03

1 Answers1

1
  1. In the device code check CUDA_ARCH macro value
  2. In the host code - check major and minor fields of the device properties.
Eugene
  • 9,242
  • 2
  • 30
  • 29
  • Imagine a cc1.2 device and code compiled with -arch=sm_11. Looking at major and minor only you would be good calling the double-kernel. But since you compiled for cc1.1 that kernel does not exist. How do you overcome the limitation that major/minor does not accord with CUDA_ARCH? Is it reasonable to write an additional kernel which just returns CUDA_ARCH? You could call the kernel once and store its return value somewhere? – pdresselhaus Aug 06 '12 at 12:31
  • __CUDA_ARCH__ reflects the compilation options. Its value does not reflect current device - that is __CUDA_ARCH__ will be 110 on any compatible CUDA device if you used arch=compute_11 (sm_11 code will only run on SM 1.1 devices). You should use conditional compilation to write the code for a specific device (e.g. #if __CUDA_ARCH__ >= 130 #else #endif). Then you will need to compile your code for several architectures. CUDA driver will load an image for proper architecture so you don't have to do anything specific in this case. – Eugene Aug 06 '12 at 15:52
  • Another thing to consider is having different kernels for different devices - then you may choose the kernel to run depending on device properties. – Eugene Aug 06 '12 at 15:53
  • I understand the fact that I should use conditional compilation from the link aland posted as a response to my original question. My problem is how do I, in host code, know which kernel is currently available. Since the **__CUDA_ARCH__** is not available in the host trajectory I cannot check for it. The point is, I would need to convert my data to float any time *#if CUDA_ARCH >= 130 #else #endif* evaluates to be . So what do I do? – pdresselhaus Aug 08 '12 at 14:59
  • You detect the device and then prepare data depending on the device capability. e.g. if (device.major == 1 && device.minor > 2) || device.major > 1 { } else { – Eugene Aug 08 '12 at 15:55