I am afraid I don't have an answer that would have helped you four years ago. However, as is so often the case, using a new CMake version improves things dramatically. CMake 3.18 was the first to officially support using Clang to compile CUDA. I tried that version, but it didn't know how to use my clang++-12
installation. It might have been released after CMake 3.18.
No matter; on CMake 3.19+, setting CMAKE_CUDA_COMPILER
"just works" with Clang 12.
First off, here's the CMakeLists.txt:
cmake_minimum_required(VERSION 3.20)
project(clang-cuda-test LANGUAGES CUDA)
add_executable(
vectorAdd
# Sources
vectorAdd.cu
# Headers
helper_cuda.h
helper_string.h
)
target_include_directories(vectorAdd PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
I've copied the various source files out of the CUDA 11 samples. So I'll set Clang 12 as my compiler (though note my CUDA 11 installation is too new for it, which is why I get a warning):
alex@alex-ubuntu:~/test$ cmake -G Ninja -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=clang++-12
-- The CUDA compiler identification is Clang 12.0.1
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/clang++-12 - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/alex/test/build
alex@alex-ubuntu:~/test$ cmake --build build/ -- -v
[1/2] /usr/bin/clang++-12 -I../ -O3 -DNDEBUG --cuda-gpu-arch=sm_52 --cuda-path=/usr/local/cuda -MD -MT CMakeFiles/vectorAdd.dir/vectorAdd.cu.o -MF CMakeFiles/vectorAdd.dir/vectorAdd.cu.o.d -x cuda -c ../vectorAdd.cu -o CMakeFiles/vectorAdd.dir/vectorAdd.cu.o
clang: warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11030. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
[2/2] : && /usr/bin/clang++-12 CMakeFiles/vectorAdd.dir/vectorAdd.cu.o -o vectorAdd -lcudadevrt -lcudart_static -lrt -lpthread -ldl -L"/usr/local/cuda/lib64" && :
alex@alex-ubuntu:~/test$ ./build/vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done