4

I'm trying to parallelize an existing application, I have most of the application parallelized and running on the GPU, I'm having issues migrating one function to the GPU

The function uses a function dtrsv which part of the blas library,see below.

void dtrsv_call_N(double* B, double* A, int* n, int* lda, int* incx) {
  F77_CALL(dtrsv)("L","T","N", n, B, lda, A, incx);
}

I've been able to call the equivalent cuda/cublas function as per below,and the results produced are equivalent to the fortran dtrsv sub routine.

status = cublasDtrsv(handle,CUBLAS_FILL_MODE_LOWER,CUBLAS_OP_T,CUBLAS_DIAG_NON_UNIT, x, dev_m1, x, dev_m2, c);

if (status != CUBLAS_STATUS_SUCCESS) {
        printf ( "!!!! kernel execution error.\n");
        return EXIT_FAILURE;
    }

My problem is that I need to be able to call cublasDtrsv from a device or global function, like below,

__global__ void Dtrsv__cm2(cublasHandle_t handle,cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double *A, int lda, double *x, int incx){
    cublasDtrsv(handle,uplo,trans,diag, n, A, lda, x, incx);
}

In cuda 4.0 if I try to compile the below I get the below error, does anyone know if there is a means by which cublas functions can be called from a __device__ or __global__ function?

error: calling a host function("cublasDtrsv_v2") from a __device__/__global__ function("Dtrsv__dev") is not allowed

ronalchn
  • 12,225
  • 10
  • 51
  • 61
C oneil
  • 157
  • 1
  • 4
  • no this isn't not possible. Cublas is a host api and library. – talonmies Aug 31 '12 at 17:40
  • Thanks very much for the quick answer, Does anyone know if there is an equivalent which can be called from a device function or a global function ? – C oneil Aug 31 '12 at 18:39

1 Answers1

6

CUDA Toolkit 5.0 introduced a device linker that can link device object files compiled separately. I believe, CUBLAS functions from CUDA Toolkit 5.0 can now be called from device functions (but I only reviewed the headers, I have no experience using CUBLAS).

talonmies
  • 70,661
  • 34
  • 192
  • 269
Eugene
  • 9,242
  • 2
  • 30
  • 29
  • I can compile the CU file that references the CUBLAS from device code, I'm trying to figure out how to link it :) – Eugene Aug 31 '12 at 19:19
  • 1
    I don't understand how this can possibly ever work. The CUBLAS libraries are supplied as IA32/x86_64 binary only libraries, and internally the routines call host side runtime API functions, launch kernels. How could such code ever run the GPU? – talonmies Sep 01 '12 at 00:38
  • @talonmies Separate compilation. The libraries themselves contain relocatable device code. – Eugene Sep 03 '12 at 20:37
  • Thanks all for the replies, what I have tried to do since has been to compile the code in http://www.netlib.org/clapack/cblas/dtrsv.c to a device function, so far It seems to be okay, when I have finished my development I'll confirm whether it works, – C oneil Sep 11 '12 at 21:28
  • 4
    It certainly [is possible](http://docs.nvidia.com/cuda/cublas/index.html#topic_3_7) to call CUBLAS routines from device code, a new feature in CUDA 5. I'm not sure why this answer was downvoted. It uses the dynamic parallelism feature and therefore requires a cc 3.5 or better device. Upvoting. Please upvote again so we can get this off the unanswered list. – Robert Crovella Jul 11 '13 at 02:39