On CPUs, you can use thread sanitizers of compilers. GCC and Clang support this with the option -fsanitize=thread
. You can find more information for example in the LLVM documentation. Note that these tools are quite new and are thus possibly a bit experimental. Alternatively, Helgrind of Valgrind can help you to find synchronization issues often causing race-conditions. If you are strongly tied to LLVM, you can try Archer. There are also several non-free tools for that (including Intel Inspector or Coderrect) mainly based on the last decade of active public research on the topic (see here for example).
On CUDA-featured GPUs, the only simple/ready-to-use tool I am aware of is CUDA-MemCheck which is similar to what Valgrind provide on CPUs. It can be combined with CUDA-GDB to quite-easily find bugs in small CUDA codes.
Finally, when you are facing reproducibility issues (like with race-conditions), deterministic reverse debuggers can really make the difference. RR is a great open-source tool for that. I am not quite sure it support application running CUDA kernels, but it is certainly worth a try. Note that RR tends to run threads sequentially (although they are preempted) impacting the resulting behaviour.