1

In here Robert Crovella said that cublas routines can be called from device code. Although I am using dynamic parallelism and compiling with compute capability 3.5, I cannot manage to call Cublas routines from a device function. I always get the error "calling a host function from a device/global function is not allowed" My code contains device functions which call CUBLAS routines like cublsAlloc, cublasGetVector, cublasSetVector and cublasDgemm

My compilation and linking commands:

  
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc GPUutil.cu -o ./build/GPUutil.o   
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc DivideParalelo.cu -o ./build/DivideParalelo.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -dlink ./build/io.o ./build/GPUutil.o ./build/DivideParalelo.o -lcudadevrt -o ./build/link.o
icc -Wwrite-strings ./build/GPUutil.o ./build/DivideParalelo.o ./build/link.o -lcudadevrt -L/usr/local/cuda/lib64  -L~/Intel/composer_xe_2015.0.090/mkl/lib/intel64  -L~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64  -Wl,--start-group ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_core.a ~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64/libiomp5.a -Wl,--end-group -lpthread  -lm  -lcublas -lcudart   -o DivideParalelo   
 
Community
  • 1
  • 1
emartel
  • 49
  • 9
  • 3
    Your compilation commands are not correct. You are not linking against `-lcublas_device` and there are other issues. You might want to refer to the [cuda sample codes](http://docs.nvidia.com/cuda/cuda-samples/index.html#simpledevlibcublas-gpu-device-api-library-functions--cuda-dynamic-parallelism-) that show how to use cublas from the device, and include makefiles that you can study. [This question/answer](http://stackoverflow.com/questions/27094612/cublas-matrix-inversion-from-device) gives a completely worked example including compile commands. – Robert Crovella Mar 19 '15 at 14:14

1 Answers1

2

Here you can find all the details about cuBLAS device API, such as:

Starting with release 5.0, the CUDA Toolkit now provides a static cuBLAS Library cublas_device.a that contains device routines with the same API as the regular cuBLAS Library. Those routines use internally the Dynamic Parallelism feature to launch kernel from within and thus is only available for device with compute capability at least equal to 3.5.

In order to use those library routines from the device the user must include the header file “cublas_v2.h” corresponding to the new cuBLAS API and link against the static cuBLAS library cublas_device.a.

If you still experience issues even after reading through the documentation and applying all of the steps described there, then ask for additional assistance.

Community
  • 1
  • 1
Michal Hosala
  • 5,570
  • 1
  • 22
  • 49
  • Thanks for your quick answer. I have added #included "cublas_v2.h" in my GPUutil.cu file and -lcublas_device for the link in Makefile. However cublas is not detected and I have a compilation error in any cublas call. I am using nsight with cuda 6.5 Any idea which am I doing wrong? – emartel Mar 19 '15 at 16:42
  • @emartel are you compiling for compute capability 3.5? I.e. `sm_35` ? – Michal Hosala Mar 19 '15 at 17:14
  • Yes I am compiling for cc 3.5 You can see CUDA_FLAGS in my Makefile and it is used when I compile GPUutil.cu (this file contains the calls to cublas routines). – emartel Mar 19 '15 at 17:33