0

I'm trying to use some LAPACKE functions inside a CUDA kernel to solve small systems of linear equations. I have a main source file that contains the kernel function I want to call. Inside that kernel function I want to call the LAPACKE function LAPACKE_dgesv(), which is defined in a different source file.

In my main source file I have included the header file lapacke.h which contains the declaration for LAPACKE_dgesv(). In addition I have edited lapacke.h to prepend __device__ to the function declaration of LAPACKE_dgesv().

I added the directory of the source file that contains the definition of LAPACKE_dgesv() to the include_dirs argument of the SourceModule call in my Python code. However when I run the code I get this error:

ptxas fatal   : Unresolved extern function 'LAPACKE_dgesv'

My guess is that the source file containing the definition of LAPACKE_dgesv() is not being compiled.

Is there a way to get PyCuda to compile multiple source files that contain device code? It seems that there would need to be a way for PyCuda to run the CUDA compiler with the --relocatable-device-code=true flag.

Thomas
  • 1,103
  • 3
  • 13
  • 25

1 Answers1

1

No, you can't do this with SourceModule.

There is an experimental DynamicSourceModule which has been added to the Master branch very recently and which probably can do what you want, although it isn't well documented and I have never used it. Otherwise, you can always statically compile and device link the code to a cubin file yourself outside of PyCUDA and just load the resulting device code via the standard APIs.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • So if I manually create a cubin file outside of PyCuda could I then load it into my Python code using the `module_from_file` PyCuda function? – Thomas Jul 10 '17 at 05:18
  • Yes. You also have the advantage of being able to a priori inspect the contents of the cubin to get symbol names in the case where you have C++ linkage in the device code – talonmies Jul 10 '17 at 05:23