4

I need to create a shared library for cuda. The compilation of the library works fine but when I try to use it in my program nvcc returns a linker or ptxas error.

I reduced the problem to the following code. The library must replace different C functions (here: memset). The library consists of three C++ files:

FileA.h

#ifndef FILEA_H_
#define FILEA_H_

namespace A {
    __device__ 
    void* memset(void* _in, int _val, int _size);
};
#endif

FileA.cpp

#include "FileA.h"

__device__ 
void* A::memset(void* _in, int _val, int _size) {
    char* tmp = (char*)_in;
    for(int i = 0; i < _size; i++) tmp[i] = _val;
    return _in;
}

TempClass.h

#ifndef TEMPCLASS_H_
#define TEMPCLASS_H_

#include "FileA.h"

namespace A {
    template <typename T>
    class TC {
    public:
        __device__ 
        TC() {
            data = new T[10];
        }

        __device__ 
        ~TC(){
            delete [] data;
        }

        __device__ 
        void clear(){
            A::memset(data, 0, 10*sizeof(T));
        }

        T* data;
    };
};
#endif

Using the following commands I create a shared library:

nvcc -Xcompiler -fPIC -x cu -rdc=true -c FileA.cpp -o FileA.o
nvcc -Xcompiler -fPIC --shared -o libTestA.so FileA.o -lcudart

This library should be used in a main program:

main.cpp

#include <cuda.h>
#include <TempClass.h>
#include <iostream>

__device__
int doSomthing() {
    A::TC<int>* tc = new A::TC<int>();
    tc->clear();
    for (int i = 0; i < 5; i++) tc->data[i] = i;

    int sum = 0;
    for (int i = 0; i < 5; i++)  sum += tc->data[i];
    delete tc;
    return sum;
}

__global__
void kernel(int* _res) {
    _res[0] = doSomthing();
}

int main(int argc, char** argv) {
    int* devVar;
    int* hostVar;
    hostVar = new int[1];
    hostVar[0] = -1;
    cudaMalloc(&devVar, sizeof(int));
    cudaMemcpy(devVar, hostVar, sizeof(int), cudaMemcpyHostToDevice);

    kernel<<< 1, 1>>> (devVar);

    cudaMemcpy(hostVar, devVar, sizeof(int), cudaMemcpyDeviceToHost);

    std::cout << "kernel done. sum " << *hostVar << std::endl;

    return 0;
}

If I try to compile the program with the commands:

nvcc -Xcompiler -fPIC -I. -L. -rdc=true -x cu -c main.cpp -o main.o 
nvcc -Xcompiler -fPIC -I. -L. main.o -o main -lTestA

I receive the error message:

nvlink error   : Undefined reference to '_ZN1A6memsetEPvii' in 'main.o'

I receive the same error if I try to compile the file directly:

nvcc -Xcompiler -fPIC -I. -L. -rdc=true -x cu main.cpp -o main -lTestA

The command nm libTestA.so shows that the library contains the function symbol _ZN1A6memsetEPvii.

When I remove the -rdc=true option while linking I receive a ptxas error:

ptxas fatal   : Unresolved extern function '_ZN1A6memsetEPvii'

In my case static linking is no option, I need a shared library. I've also tried to make memset an extern "C" function but this collides with the original C function. The code compiles correctly with g++. Do you have suggestions how to solve this problem.

Tim
  • 41
  • 1
  • 3
  • 1
    1. `FileA.cpp` should be `FileA.cu`. 2. You're not compiling `FileA.cu`; why would you expect the linking to work? 3. If you "need" a shared library, why are you linking statically? I don't understand. – einpoklum Feb 22 '18 at 17:24
  • 1. The option -x cu makes allows nvcc to handle cpp like cu-files. 2. I compile it to a shared lib (see command before main.cpp). 3. Where do you see any static linking? Maybe I miss something. – Tim Feb 22 '18 at 19:39
  • 1. You're right that nvcc will accept it as a CUDA file, but you should still call it what it is... . 2. Ah, so the two earlier lines appear before the two lines later on... now I get it. – einpoklum Feb 22 '18 at 20:01
  • @RobertCrovella: That's an answer I think... but - it's also terrible! Why doesn't nvcc support ths? – einpoklum Feb 22 '18 at 22:05

1 Answers1

5

It appears that you are attempting to do device-code linking across a library boundary. Currently, that is only possible with a static library.

The options that I am aware of would be to switch to a static library/link arrangement, or else refactor your code so that you do not need to link device code across a dynamic library boundary.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you very much for your answer. We/I have to deal with this until nvidia fix this issue. I hope it will be soon because it is really inconvenient. – Tim Feb 23 '18 at 21:57
  • 1
    This limitation has been in place for years, since rdc was introduced. If you want to see a change you may want to file a request at developer.nvidia.com – Robert Crovella Feb 23 '18 at 22:31