1

The build log:

-------------- Clean: Release in OffloadTest (compiler: GNU GCC Compiler)---------------

Cleaned "OffloadTest - Release"

-------------- Build: Release in OffloadTest (compiler: GNU GCC Compiler)---------------

g++ -Wall -m64 -fopenmp -foffload=nvptx-none -fno-stack-protector -O2 -fopenmp -foffload=nvptx-none -fcf-protection=none -fno-stack-protector  -c /home/david/CBProjects/OffloadTest/main.cpp -o obj/Release/main.o
g++  -o bin/Release/OffloadTest obj/Release/main.o  -m64 -lgomp -s -lgomp  
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x0): undefined reference to `__offload_func_table'
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x8): undefined reference to `__offload_funcs_end'
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x10): undefined reference to `__offload_var_table'
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x18): undefined reference to `__offload_vars_end'
collect2: error: ld returned 1 exit status
Process terminated with status 1 (0 minute(s), 0 second(s))
5 error(s), 0 warning(s) (0 minute(s), 0 second(s))

I have loaded the following (with descriptions):

Gcc-9-offload-nvptx
    Description: The package provides offloading support for NVidia PTX. OpenMP and OpenACC programs linked with -fopenmp will by default add PTX code into the binaries, which can be offloaded to NVidia PTX capable devices if available.
Gcc-offload-nvptx
    Description: This package contains libgomp plugin for offloading to NVidia PTX. The plugin needs libcuda.so.1 shared library that has to be installed separately.
Nvptx-tools
    Description: This tool consists of nptx-non-as: "assembler" for PTX, nvptx-none-ld: "linker" for PTX. Additionally, the following symlinks are installed: nvptx-none-ar: link to the GNU/Linux host system's ar, nvptx-none-ranlib: link to the GNU/Linux host system's ranlib

I have verifiec that libcuda.so.1 is located at /lib/x86_64-linux-gnu

The script is simple, just an example to help me get offloading up and running. It works fine if I take out the "target" keyword

#include <iostream>
#include <omp.h>

using namespace std;
#define iSize 200000
long *A, *B;

int main()
{
   A = new long[iSize];
   B = new long[iSize];
   long sum = 0;
   double dStart, dEnd;
   int iNumberOfDevices = omp_get_num_devices();
   int iInitialDevice = omp_get_initial_device(); // device number for host computer
   int iDeviceNumber = omp_get_default_device();

   dStart = omp_get_wtime();
#pragma omp parallel for
   for (long i=0; i<iSize; i++)
   {
      A[i] = i;
      B[i] = i+1;
   }
#pragma omp target parallel for reduction(+:sum)
   for (long i=0; i<iSize; i++)
   {
      for (long j=0; j<iSize; j++)
      {
         sum += 3 * A[i] - B[j];
      }
   }
   dEnd = omp_get_wtime();
   double dtime = dEnd - dStart;
   cout << "Number of devices = " << iNumberOfDevices << endl;
   cout << "Device number = " << iDeviceNumber << endl;
   cout << "Initial Device number (host processor) = " << iInitialDevice << endl;
   cout << endl;
   cout << "Sum = " << sum << endl;
   cout << "Processing time = " << dtime << " Seconds" << endl;
}

Any help is appreciated.

  • David
user4581301
  • 33,082
  • 7
  • 33
  • 54

1 Answers1

0

To resolve the undefined references, specify -fopenmp (and potentially again -foffload=nvptx-none if that isn't the default) instead of -lgomp (duplicated, by the way).

What also is missing, I think, is some omp target data (or similar) directives to setup the A and B arrays on the device?

tschwinge
  • 346
  • 1
  • 5
  • Thanks for your thoughtful response @tschwinge. I didn't realize that my IDE added the -lgomp switch automatically, so I removed the duplicate. I already have the -foffload=nvptx-none flag I added map statements to the second omp for directive #pragma omp target parallel for map(to:iSize,A[0:iSize],B[0:iSize]) map(from:sum) That flagged the define statement with error: expected unqualified-id before numeric constant| Changed it to long iSize = 200000; Now I am back to the same error messages I started with. – david-4142135 Jul 22 '21 at 14:25
  • You don't need to `map` `iSize` if it is a `#define`. – tschwinge Jul 23 '21 at 08:11
  • 1
    My main suggestion was to specify *also for the link invokation* (second time you're invoking `g++`) _`-fopenmp` (and potentially again `-foffload=nvptx-none` if that isn't the default) instead of `-lgomp`_. – tschwinge Jul 23 '21 at 08:14
  • What IDE is it that adds `-lgomp`? Seems like a bug in the IDE. – tschwinge Jul 23 '21 at 08:16
  • I must have put -lgomp in the release pass at some point so it wasn't the IDE Using -fopenmp in both compile and link passes fixed the issue. Many thanks. – david-4142135 Jul 23 '21 at 14:03