2

I have a Python extension in C/C++ that I want to use OpenMP offloading with. Using NVIDIA's nvc++, compiling works out as well as using/running the extension in python. The problem ist that it's not using the GPU by default. Setting $OMP_OFFLOAD_TARGET=MANDATORY leads to following error:

 
(venv) bash-4.2$ python test_2d_c.py
2D (dependend) :
Fatal error: Could not run target region on device 0, execution terminated.

Using the same OpenMP directives in a pure C/C++ version works out well.

The output during compiling btw looks promising:

 
nvc++ -Iinclude -I/software/rome/SciPy-bundle/2021.05-foss-2021a/lib/python3.9/site-
packages/numpy/core/include -I/home/h0/seth295c/cmi/venv/include -
I/sw/installed/Python/3.9.5-GCCcore-10.3.0/include/python3.9 -c src/gamma_c.cpp -o
build/temp.linux-x86_64-3.9/src/gamma_c.o -g -O3 -shared -std=c++17 -Minfo=mp -mp=gpu -
mp -target=gpu
cmi::gamma_c_2d_naive(_object *, _object *):
     97, #omp target teams distribute parallel for
kernel
97, Generating "nvkernel__ZN3cmi16gamma_c_2d_naiveEP7_objectS1__F1L97_1" GPU
    Generating Tesla and Multicore code
    Generating reduction(+:res,.res22168p)
    Loop parallelized across teams and threads(128), schedule(static)
cmi::gamma_c_2d(_object *, _object *):
    228, #omp parallel
    240, #omp parallel
        240, Generating reduction(+:res)
cmi::gamma_c_2d_independence(_object *, _object *):
    338, #omp parallel
    350, #omp parallel
        350, Generating reduction(+:res)
talonmies
  • 70,661
  • 34
  • 192
  • 269
ThiloOS
  • 92
  • 7
  • Perhaps this https://stackoverflow.com/questions/49718730/openmp-offloaded-target-region-executed-in-both-host-and-target-device – Niteya Shah Dec 12 '21 at 18:52
  • Just in case somebody faces the same problem: After contact with Nvidia support it turned out that compiling offloading code into shared object is not yet supported with nvc++ compiler when using OpenMP Offloading. – ThiloOS Feb 13 '22 at 17:52

0 Answers0