In here Robert Crovella said that cublas routines can be called from device code. Although I am using dynamic parallelism and compiling with compute capability 3.5, I cannot manage to call Cublas routines from a device function. I always get the error "calling a host function from a device/global function is not allowed" My code contains device functions which call CUBLAS routines like cublsAlloc
, cublasGetVector
, cublasSetVector
and cublasDgemm
My compilation and linking commands:
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc GPUutil.cu -o ./build/GPUutil.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc DivideParalelo.cu -o ./build/DivideParalelo.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -dlink ./build/io.o ./build/GPUutil.o ./build/DivideParalelo.o -lcudadevrt -o ./build/link.o
icc -Wwrite-strings ./build/GPUutil.o ./build/DivideParalelo.o ./build/link.o -lcudadevrt -L/usr/local/cuda/lib64 -L~/Intel/composer_xe_2015.0.090/mkl/lib/intel64 -L~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64 -Wl,--start-group ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_core.a ~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64/libiomp5.a -Wl,--end-group -lpthread -lm -lcublas -lcudart -o DivideParalelo