2

I am getting wrong numerical results from an application parallelized with OpenMP. Each OpenMP thread runs one or more streams on an NVIDIA GPU. I suspect that there is a race condition between OpenMP threads or CUDA streams while updating memory.

How do we find out the set of OpenMP threads/CUDA streams accessing the same main memory address range? Are there any tools?

Kadir
  • 1,345
  • 3
  • 15
  • 25
  • @dreamcrash double. Host has Haswell cpus and V100. – Kadir Apr 13 '21 at 18:15
  • The largest difference is 10^9, which is too big. So the order of operations is not the reason. The difference is not the same all the time. – Kadir Apr 13 '21 at 22:15

2 Answers2

1

On CPUs, you can use thread sanitizers of compilers. GCC and Clang support this with the option -fsanitize=thread. You can find more information for example in the LLVM documentation. Note that these tools are quite new and are thus possibly a bit experimental. Alternatively, Helgrind of Valgrind can help you to find synchronization issues often causing race-conditions. If you are strongly tied to LLVM, you can try Archer. There are also several non-free tools for that (including Intel Inspector or Coderrect) mainly based on the last decade of active public research on the topic (see here for example).

On CUDA-featured GPUs, the only simple/ready-to-use tool I am aware of is CUDA-MemCheck which is similar to what Valgrind provide on CPUs. It can be combined with CUDA-GDB to quite-easily find bugs in small CUDA codes.

Finally, when you are facing reproducibility issues (like with race-conditions), deterministic reverse debuggers can really make the difference. RR is a great open-source tool for that. I am not quite sure it support application running CUDA kernels, but it is certainly worth a try. Note that RR tends to run threads sequentially (although they are preempted) impacting the resulting behaviour.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
1

Full disclosure: I work for Coderrect. I found the answer above every informative and helpful. Just want to clarify that Coderrect Scanner is currently free (evaluation version at full functionality) available from our website. It does contain simple features to deal with CUDA code which are still under continuous development, so I would encourage you to check it out and try it. Let us know how it goes and we welcome any feedback to keep improving our tool.

chrisw
  • 11
  • 1