2

I have a CUDA code in which I would like to include external code that consists of Fortran with OpenACC kernels. I have two files with the following content inspired on a discussion on the NVIDIA website. File main.cu is the following:

#include <cstdio>

extern "C" void saxpy(int*, float*, float*, float*);

int main(int argc, char **argv)
{
    float* x;
    float* y;
    float* dx;
    float* dy;

    int n = 1<<20;

    x = (float*) malloc(n*sizeof(float));
    y = (float*) malloc(n*sizeof(float));

    for (int i=0; i<n; ++i)
    {
        x[i] = 1.f;
        y[i] = 0.f;
    }

    cudaMalloc((void**) &dx, (size_t) n*sizeof(float));
    cudaMalloc((void**) &dy, (size_t) n*sizeof(float));

    cudaMemcpy(dx, x, (size_t) n*sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpy(dy, y, (size_t) n*sizeof(float), cudaMemcpyHostToDevice);

    float factor = 2.f;
    saxpy(&n, &factor, dx, dy);

    cudaMemcpy(y, dy, (size_t) n*sizeof(float), cudaMemcpyDeviceToHost);
    printf("%f, %f\n", y[0], y[n-1]);

    return 0;
}

The second file saxpy.f90 is:

subroutine saxpy(n, a, x, y) bind(c, name="saxpy")
    use iso_c_binding, only: c_int, c_float

    integer(kind=c_int), intent(in) :: n
    real(kind=c_float), intent(in) :: a
    real(kind=c_float), dimension(n), intent(in) :: x(n)
    real(kind=c_float), dimension(n), intent(inout) :: y(n)

    !$acc parallel deviceptr(x, y)
    do i = 1, n
        y(i) = y(i) + a*x(i)
    end do
    !$acc end parallel
end subroutine

How do I compile this with nvcc and the PGI-compiler combined? I have tried many different options, but I have always ended with unresolved externals.

What I tried is: pgf90 -ta=tesla:cc35 -acc saxpy.f90 -c for the Fortran file and that compiles fine. The next step is where I am stuck. This: nvcc -arch=sm_35 -ccbin pgc++ main.cu saxpy.o yields unresolved externals for which I am unsure how to solve it. How can I find out which external libraries to include?

Chiel
  • 6,006
  • 2
  • 32
  • 57
  • Trying to use pgc++ as a compiler isn't supported. You probably need something like this: https://stackoverflow.com/a/38214143/681865 although I am not in a position to test anything right pgi right now – talonmies Mar 30 '20 at 13:29
  • @talonmies. How will that help me in compiling the cuda device code? Do I need to put that into a separate file? – Chiel Mar 30 '20 at 13:47
  • The pgf90 call in your question already compiles device code. You just need to disable separate device compilation (as shown in the linked answer). You are going to have to work out what PGI libraries you need to link using nvcc youself – talonmies Mar 30 '20 at 13:50
  • @talonmies. But what about the memory allocations and copies between host and GPU? – Chiel Mar 30 '20 at 13:51
  • What about them? They are in the .cu file, which will be compiled by nvcc and provided by the cuda runtime API library which you need to link – talonmies Mar 30 '20 at 13:56

1 Answers1

2

The symbols are most likely missing since you're not adding either the OpenACC or Fortran runtime libraries to your link. Also, when not using a PGI driver to link, you need to add the "nordc" flag. For example:

% pgfortran -c -ta=tesla:cc70,nordc saxpy.f90                                       
% nvcc -arch=sm_70 -ccbin pgc++ -Xcompiler "-ta=tesla:cc70 -pgf90libs" main.cu saxpy.o
% a.out
2.000000, 2.000000

Though, I'd recommend using pgfortran to link so you can use RDC and don't need to add the Fortran runtime libraries:

% nvcc -arch=sm_70 -ccbin pgc++ -c main.cu
% pgfortran -Mcuda -ta=tesla:cc70 -Mnomain saxpy.f90 main.o
saxpy.f90:
% a.out
2.000000, 2.000000
Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Great. This works well. I needed to add `-lstd++` to the `pgfortran` call in order to compile. – Chiel Mar 30 '20 at 15:07
  • 1
    I've put up some other interoperability examples like this that you might find helpful. Now I'll have to go back and make sure they don't need a -lstd++ with recent compilers. https://github.com/jefflarkin/openacc-interoperability – jefflarkin Mar 31 '20 at 13:23
  • 1
    Jeff, there's also the pgfortran flag "-c++libs" which will have the driver add the C++ libraries to the link. – Mat Colgrove Mar 31 '20 at 13:49
  • @jefflarkin That is really helpful. I used the PGI18 compiler, as that one is compatible with CUDA10 on the supercomputer that I use. – Chiel Apr 01 '20 at 10:00