2

I am trying to compile CUDA with clang, but the code I am trying to compile depends on a specific nvcc flag (-default-stream per-thread). How can I tell clang to pass the flag to nvcc?

For example, I can compile with nvcc and everythign works fine:

nvcc -default-stream per-thread *.cu -o app

But when I compile from clang, the program does not behave correctly because I can not pass the default-steam flag:

clang++ --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 *.cu -o app -lcudart_static -ldl -lrt -pthread

How do I get clang to pass flags to nvcc?

Increasingly Idiotic
  • 5,700
  • 5
  • 35
  • 73
  • 2
    It's not a general answer about compiler flags for clang, but for this particular one (`-default-stream per-thread`), [this blog](https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/) indicates that an alternative method to get the same functionality is "`#define` the `CUDA_API_PER_THREAD_DEFAULT_STREAM` preprocessor macro before including CUDA headers (cuda.h or cuda_runtime.h)." That may be worth a try with clang. – Robert Crovella Oct 13 '19 at 18:56
  • Thanks, that blog post is extremely useful for this situation. For whatever reason, adding the define didn't work when compiling with clang (but it does when using nvcc?). Either way, that post has given me enough information to try and figure out something else. It is very much appreciated! – Increasingly Idiotic Oct 15 '19 at 15:54
  • Note that the define has to be in place before cuda_runtime_api.h is included. So the define doesn't work with nvcc as the blog states, because nvcc prepends that include to your file before any of your file is processed. I'm not that familiar with clang, so I wasn't sure if clang does that or not when processing cuda files. Probably it does, which may explain why its not working there. – Robert Crovella Oct 15 '19 at 17:22
  • I tried adding the define at the top of cuda_runtime.h directly, which worked for nvcc but not for clang. – Increasingly Idiotic Oct 16 '19 at 18:21

2 Answers2

2

It looks like it may not be possible.

nvcc behind the scenes calls either clang/gcc with some custom generated flags and then calls ptxas and some other stuff to create the binary.

e.g.

nvcc -default-stream per-thread foo.cu
# Behind the scenes
gcc -custom-nvcc-generated-flag -DCUDA_API_PER_THREAD_DEFAULT_STREAM=1 -o foo.ptx
ptxas foo.ptx -o foo.cubin

When compiling to CUDA from clang, clang compiles directly to ptx and then calls ptxas:

clang++ foo.cu -o app -lcudart_static -ldl -lrt -pthread
# Behind the scenes
clang++ -triple nvptx64-nvidia-cuda foo.cu -o foo.ptx
ptxas foo.ptx -o foo.cubin

clang never actually calls nvcc. It just targets ptx and calls the ptx assembler.

Unless you know what custom backend flags will be produced by nvcc and manually include them when calling clang, I'm not sure you can automatically pass an nvcc flag from clang.

Increasingly Idiotic
  • 5,700
  • 5
  • 35
  • 73
  • 1
    If a macro needs to be defined before inclusion of cuda_runtime.h, it will need to be passed via `-D` to clang. Under the hood clang does pre-include bunch of CUDA headers (and so does nvcc), so defining the macro in the source code would not have effect as that would be seen by compiler *after* the inclusion of cuda_runtime.h. – ArtemB Dec 30 '19 at 21:50
0

If you are using features specific to clang only for the host side and don't actually need it for the device side - you're probably looking for this :

https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/

As @Increasingly-Idiotic points out - I believe clang does not "call" nvcc internally, hence I don't think you can pass arguments to it.