10

I'm debugging a crash of my OpenCL application. I attempted to use ASan to pin down where the problem originates. But then I discovered that I enable ASan when recompiling, my application cannot find any OpenCL devices. Simply adding -fsanitize=address to the compiler options made my program unable to use OpenCL.

With further testing, I am certain ASan is the reason.

Why is this happening? How can I use asan with OpenCL?

An MVCE:

#include <CL/cl.hpp>
#include <vector>
#include <iostream>

int main() {
    std::vector<cl::Platform> platforms;
    cl::Platform::get(&platforms);
    if(platforms.size() == 0)
      std::cout << "Compiled with ASan\n";
    else
      std::cout << "Compiled normally\n";
}

cl::Platform::get returns CL_SUCCESS but an empty list of devices.

Some information about my setup:
GPU: GTX 780Ti
Driver: 418.56
OpenCL SDK: Nvidia OpenCL / POCL 1.3 with CPU and CUDA backend
Compiler: GCC 8.2.1
OS: Arch Linux (Kernel 5.0.7 x64)

Mary Chang
  • 865
  • 6
  • 25
  • Could you provide [MVCE](https://stackoverflow.com/help/mcve)? What are the error codes from OpenCL APIs? – yugr Apr 18 '19 at 17:41
  • Sorry for the inconveniences. As soon asan is applied to a application, cl::Platform::Get() (using the C++ wrapper) returns nothing. Which returns normally and throws then encountered an error. – Mary Chang Apr 19 '19 at 01:32
  • Thanks, what about [error code](https://stackoverflow.com/a/24336429/2170527)? Which OpenCL implementation is this? I suggest to add all those details to the question. – yugr Apr 19 '19 at 08:34
  • Sorry for not being clear again. It returns normally. IE an error code of CL_SUCCESS. I'm using both NVIDIA's OpenCL and POCL with CUDA backend. But I think this is a ICD loader problem that it failed to list all platforms in the first place. – Mary Chang Apr 20 '19 at 04:46
  • Any new developments on this? This question turns up as the only match when searching for ASAN and OpenCl. Would be good to find a solution for this one at some point... – lubgr Jun 14 '21 at 16:48

1 Answers1

9

The NVIDIA driver is known to conflict with ASAN. It attempts to mmap(2) memory into a fixed virtual memory range within the process, which coincides with ASAN's write-protected shadow gap region. Given that ASAN reserves about 20TB of virtual address space on startup, such conflicts are not unlikely with other programs or drivers, too.

ASAN recognizes certain flags that may be set in the ASAN_OPTIONS environment variable. To resolve the shadow gap range conflict, set the protect_shadow_gap option to 0. For example, assuming a POSIX-like shell, you may run your program like

$ ASAN_OPTIONS=protect_shadow_gap=0 ./mandelbrot

The writable shadow gap incurs additional performance costs under ASAN, since an unprotected gap requires its own shadowing. This is why it's not recommended to set this option globally (e. g., in your shell startup script). Enable it only for the programs that in fact require it.

I'm nearly certain this is the root cause of your issue. I am using ASAN with CUDA programs, and always need to set this option. The failure reported by CUDA without it is very similar: cudaErrorNoDevice error when I attempt to select a device.

  • Thank you so much for the excellent explanation with detailed background. This is really, helpful!! The proposed solution looks good in a first attempt. I'll have a closer look. – lubgr Jun 18 '21 at 07:45