I'm trying to compile a simple vector addition example daxpy
with OpenMP offloading to GPU. When I compile the code with g++
v10.2.0 I see the following error:
daxpy_loop_target.cpp:37:6: error: function ‘plasma_core_omp_daxpy_loop_target’ \
has been referenced in offloaded code \
but hasn’t been marked to be included in the offloaded code
It seems I need to mark that this function is going to be offloaded to GPU. How do I do that?
Please find my C++ source code daxpy_loop_target.cpp
below:
#include <omp.h>
#include <cstdio>
#include "daxpy_core.hpp"
#include "daxpy_loop_target.hpp"
void plasma_daxpy_loop_target(int n, int nb, double da, double *dx, int incx, double *dy, int incy) {
plasma_omp_daxpy_loop_target(n, nb, da, dx, incx, dy, incy);
}
void plasma_omp_daxpy_loop_target(int n, int nb, double da, double *dx, int incx, double *dy, int incy) {
#pragma omp target
for (int i = 0; i < n; i += nb) {
plasma_core_omp_daxpy_loop_target(nb, da, &dx[i], incx, &dy[i], incy);
}
}
void plasma_core_omp_daxpy_loop_target(int n, double da, double *dx, int incx, double *dy, int incy) {
plasma_core_daxpy(n, da, dx, incx, dy, incy);
}
Function plasma_core_daxpy
is defined in daxpy_core.cpp
as:
#include "mkl_cblas.h"
#include "daxpy_core.hpp"
void plasma_core_daxpy(int n, double da, double *dx, int incx, double *dy, int incy) {
cblas_daxpy(n, da, dx, incx, dy, incy);
}
My compiler and compiler flags are the following:
CXX = g++
CXXFLAGS =-fopenmp -foffload="-lm -latomic" -m64 ${INC}
Finally, the make
output is the following:
g++ -fopenmp -foffload="-lm -latomic" -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" -c daxpy_util.cpp -o daxpy_util.o
g++ -fopenmp -foffload="-lm -latomic" -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" -c daxpy_core.cpp -o daxpy_core.o
g++ -fopenmp -foffload="-lm -latomic" -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" -c daxpy_loop_target.cpp -o daxpy_loop_target.o
g++ -fopenmp -foffload="-lm -latomic" -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" -c test_daxpy_loop_target.cpp -o test_daxpy_loop_target.o
g++ -fopenmp -foffload="-lm -latomic" -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" daxpy_util.o daxpy_core.o daxpy_loop_target.o test_daxpy_loop_target.o -Wl,-rpath,/lustre/system/local/apps/gcc10/10.2.0/bin -Wl,-rpath,/lustre/system/local/apps/gcc10/10.2.0/bin -Wl,--start-group /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_intel_lp64.a /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_sequential.a /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -o test_daxpy_loop_target
daxpy_loop_target.cpp:37:6: error: function ‘plasma_core_omp_daxpy_loop_target’ has been referenced in offloaded code but hasn’t been marked to be included in the offloaded code
37 | void plasma_core_omp_daxpy_loop_target(int n, double da, double *dx, int incx, double *dy, int incy) {
| ^
lto1: fatal error: errors during merging of translation units
compilation terminated.
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /netfs/smain01/system/local/apps/gcc10/10.2.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/10.2.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/lustre/system/local/apps/gcc/utilities/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make: *** [test_daxpy_loop_target] Error 1
Any hints or pointers would be greatly appreciated!
Update A
Other compilers provide less helpful error messages, e.g. clang++
v10.0.0 or nvc++
v21.3-0 complain about the undefined reference to plasma_core_daxpy
:
clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_70 -std=c++17 -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" daxpy_util.o daxpy_core.o daxpy_loop_target.o test_daxpy_loop_target.o -Wl,-rpath,/lustre/system/local/apps/gcc10/10.2.0/bin -Wl,-rpath,/lustre/system/local/apps/gcc10/10.2.0/bin -Wl,--start-group /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_intel_lp64.a /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_sequential.a /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -o test_daxpy_loop_target
nvlink error : Undefined reference to '_Z17plasma_core_daxpyidPdiS_i' in '/tmp/daxpy_loop_target-df645f.cubin'
clang-10: error: nvlink command failed with exit code 255 (use -v to see invocation)
nvc++ -mp=gpu -gpu=managed -m64 -I"/lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/include" daxpy_util.o daxpy_core.o daxpy_loop_target.o test_daxpy_loop_target.o -Wl,-rpath,/lustre/system/local/apps/gcc10/10.2.0/bin -Wl,-rpath,/lustre/system/local/apps/gcc10/10.2.0/bin -Wl,--start-group /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_intel_lp64.a /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_sequential.a /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -o test_daxpy_loop_target
nvlink error : Undefined reference to '_Z17plasma_core_daxpyidPdiS_i' in 'daxpy_loop_target.o'
pgacclnk: child process exit status 2: /netfs/smain01/system/local/apps/nvidia-sdk/21.3/Linux_x86_64/21.3/compilers/bin/tools/nvdd
make: *** [test_daxpy_loop_target] Error 2
Update B
Assume I would like to implement my own version of daxpy
for the GPU. I'm trying to declare it as a function to be offloaded with:
#pragma omp begin declare target
extern void plasma_core_daxpy(int, double, double*, int, double*, int);
#pragma omp end declare target
However, none of the compilers I use g++
, nvc++
and clang++
understand this declaration. These compilers fail with the errors similar to (clang++
v10.0.0):
daxpy_loop_target.cpp:16:13: error: expected an OpenMP directive
#pragma omp begin declare target
^
daxpy_loop_target.cpp:18:25: error: unexpected OpenMP directive '#pragma omp end declare target'
#pragma omp end declare target
Can you please advise what compiler versions support the #pragma omp begin|end declare target
sentinels?