13

TL;DR - Does GCC (trunk) already support OpenMP 4.0 offloading to nVidia GPU?

If so, what am I doing wrong? (description below).


I'm running Ubuntu 14.04.2 LTS.

I have checked out the most recent GCC trunk (dated 25 Mar 2015).

I have installed the CUDA 7.0 toolkit according to Getting Started on Ubuntu guide. CUDA samples run successfully, i.e. deviceQuery detects my GeForce GT 730.

I have followed the instructions from https://gcc.gnu.org/wiki/Offloading as well as https://gcc.gnu.org/install/specific.html#nvptx-x-none

I have installed nvptx-tools and nvptx-newlib (configure, make, sudo make install), newlib also linked inside GCC's trunk directory with ln -s.

Then I built the target accelerator nvptx-none compiler:

../../trunk/configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
make -j 9
sudo make install DESTDIR=/install

...and the host GCC compiler itself:

../trunk/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/install/prefix --with-cuda-driver=/usr/local/cuda --enable-languages=c,c++
make -j 9
sudo make install DESTDIR=/install

I have set the LD_LIBRARY_PATH accordingly:

export LD_LIBRARY_PATH=/install/usr/local/lib64:/install/usr/local/lib/gcc/nvptx-none/5.0.0/:/usr/local/cuda/lib64:$LD_LIBRARY_PATH

For sure, the mkoffload tool is built:

/install/usr/local/libexec/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/mkoffload

as well the target and host compilers are there:

/install/usr/local/bin/x86_64-pc-linux-gnu-gcc
/install/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc

But when I compile a sample code that queries the number of devices with omp_get_num_devices(), I get the response 0:

$ /install/usr/local/bin/x86_64-pc-linux-gnu-gcc -fopenmp -foffload=nvptx-none main.c
$ ./a.out
0

When I add -v (verbose) option to the target compiler's options, I get the following output:

$ /install/usr/local/bin/x86_64-pc-linux-gnu-gcc -fopenmp -foffload=nvptx-none="-v" main.c

Using built-in specs.
COLLECT_GCC=/install/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
Target: nvptx-none
Configured with: ../../trunk/configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
Thread model: single
gcc version 5.0.0 20150325 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-m64' '-S' '-fmath-errno' '-fsigned-zeros' '-ftrapping-math' '-fno-trapv' '-fno-strict-overflow' '-fno-openacc' '-foffload-abi=lp64' '-fopenmp' '-v' '-v' '-o' '/tmp/cccxIggp.mkoffload'
 /install/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/lto1 -quiet -dumpbase ccKOW9hi.o -m64 -auxbase-strip /tmp/cccxIggp.mkoffload -version -fmath-errno -fsigned-zeros -ftrapping-math -fno-trapv -fno-strict-overflow -fno-openacc -foffload-abi=lp64 -fopenmp -o /tmp/cccxIggp.mkoffload @/tmp/ccjRDWhp
GNU GIMPLE (GCC) version 5.0.0 20150325 (experimental) (nvptx-none)
    compiled by GNU C version 5.0.0 20150325 (experimental), GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU GIMPLE (GCC) version 5.0.0 20150325 (experimental) (nvptx-none)
    compiled by GNU C version 5.0.0 20150325 (experimental), GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
COMPILER_PATH=/install/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/:/install/usr/local/bin/../libexec/gcc/
LIBRARY_PATH=/install/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/5.0.0/accel/nvptx-none/:/install/usr/local/bin/../lib/gcc/
COLLECT_GCC_OPTIONS='-m64' '-S' '-fmath-errno' '-fsigned-zeros' '-ftrapping-math' '-fno-trapv' '-fno-strict-overflow' '-fno-openacc' '-foffload-abi=lp64' '-fopenmp' '-v' '-v' '-o' '/tmp/cccxIggp.mkoffload'

So it looks that the toolchain gets invoked and .mkoffload files are created.

Please help. If it should work, how can I diagnose what's wrong?

Marc Andreson
  • 3,405
  • 5
  • 35
  • 51
  • Are you certain you have a fully functional CUDA installation? – talonmies Mar 27 '15 at 10:04
  • @talonmies the CUDA samples run successfully detecting my nVidia GPU – Marc Andreson Mar 27 '15 at 10:14
  • Sorry, but I had to ask. You wouldn't believe the number of times people come here to ask why their code doesn't work and the root cause is that they don't have a functional CUDA installation. – talonmies Mar 27 '15 at 10:51
  • @talonmies np, please remove your comments – Marc Andreson Mar 27 '15 at 15:20
  • 1
    I've created a step-by-step guide for building GCC as well as Clang for OpenMP GPU-offloading (see https://github.com/pc2/OMP-Offloading), because I lectured on this subject this year at Paderborn University, Germany. I recommend Clang (the development version on github), because based on my tests it's less buggy than GCC and other versions of Clang. BTW. GCC has a limitation on the number of threads per team, which destroys the performance on GPU completely. – xin Mar 21 '20 at 14:58
  • @xin Awesome !! – Marc Andreson Jun 01 '20 at 05:54

1 Answers1

14

TL;DR - Does GCC (trunk) already support OpenMP 4.0 offloading to nVidia GPU?

No.

Currently GCC supports only OpenMP 4.0 offloading to Intel Xeon Phi (KNL) and OpenACC 2.0 offloading to nVidia GPU.

There are ideas on supporting OpenMP 4.0 offloading to nVidia GPU: [1], [2], but implementation has not yet begun.

UPD 2017: GCC 7.1 now supports OpenMP 4.5 offloading to NVidia GPUs [3].

Ilya Verbin
  • 647
  • 5
  • 20
  • Oh, so what is this [manual](https://gcc.gnu.org/wiki/Offloading) referring to? Does it mean the configuration of nvptx-none target compiler is required for OpenACC to work, not for OpenMP4 as I thought? Or are there any other prerequisites to offload work to GPU through OpenACC ? – Marc Andreson Mar 29 '15 at 12:57
  • 2
    Yes, nvptx-none target compiler is required for OpenACC->PTX (as well as for OpenMP->PTX, when it will be supported). I'll add to wiki page that OpenMP->PTX is not yet supported, to avoid confusion. AFAIK this manual is complete regarding OpenACC->PTX, however I haven't tried it myself. – Ilya Verbin Mar 29 '15 at 13:12
  • Oh, that's a sad news for me. Would you happen to know when nvidia offload will be possible with GCC? (or with any other compiler like clang?) – Marc Andreson Mar 29 '15 at 13:15
  • 2
    GCC 5.x definitely would not support this, as for GCC 6.x - I don't know. It's relatively easy to implement some subset of OpenMP on GPU, but support of all kinds of pragmas in target regions looks quite difficult (see 2 links above). Currently IBM works on OpenMP to GPU offloading in clang: http://openmp.org/sc14/Booth-Sam-IBM.pdf – Ilya Verbin Mar 29 '15 at 13:30
  • Thanks! I will check this Clang's branch then, maybe it works already (at least partially, to the basic extent I need) – Marc Andreson Mar 29 '15 at 13:33