Cublas Matrix LU Decomposition

Question

I'm having some trouble with calling dgetrf in cuda. From what I've found I can only called the batched version (http://docs.nvidia.com/cuda/cublas/#cublas-lt-t-gt-getrfbatched). When I do call it, I get an error value of 7 returned, which I haven't been able to find the corresponding enumeration for that error code. Below is my code, any help would be much appreciated;

void cuda_matrix_inverse (int m, int n, double* a){

    cublasHandle_t handle;
    cublasStatus_t status;
    double **devPtrA = 0;
    double **devPtrA_dev = NULL;
    int *d_pivot_array;
    int *d_info_array;
    int rowsA = m;
    int colsA = n;
    int matrixSizeA;
    cudaError_t error;

    fprintf(stderr,"starting cuda inverse\n");

    error = cudaMalloc((void **)&d_pivot_array, sizeof(int));
    if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));
    error = cudaMalloc((void **)&d_info_array, sizeof(int));
    if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

    fprintf(stderr,"malloced pivot and info\n");

    status = cublasCreate(&handle);
    if (status != CUBLAS_STATUS_SUCCESS) fprintf(stderr,"error %i\n",status);

    matrixSizeA = rowsA * colsA;

    devPtrA =(double **)malloc(1 * sizeof(*devPtrA));

    fprintf(stderr,"malloced devPtrA\n");

    error = cudaMalloc((void **)&devPtrA[0], matrixSizeA * sizeof(devPtrA[0][0]));
    if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

    error = cudaMalloc((void **)&devPtrA_dev, 1 * sizeof(*devPtrA));
    if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

    fprintf(stderr,"malloced device variables\n");

    error = cudaMemcpy(devPtrA_dev, devPtrA, 1 * sizeof(*devPtrA), cudaMemcpyHostToDevice);
    if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

    fprintf(stderr,"copied from devPtrA to d_devPtrA\n");

    status = cublasSetMatrix(rowsA, colsA, sizeof(a[0]), a, rowsA, devPtrA[0], rowsA);
    if (status != CUBLAS_STATUS_SUCCESS) fprintf(stderr,"error %i\n",status);


    status = cublasDgetrfBatched(handle, m, devPtrA_dev,m,d_pivot_array,d_info_array,1); //cannot get this to work
    if (status != CUBLAS_STATUS_SUCCESS) fprintf(stderr,"error in dgetrf %i\n",status);


    fprintf(stderr,"done with cuda inverse\n");
}

score 3 · Accepted Answer · edited May 23 '17 at 12:06

3

Error code 7 of cublas means CUBLAS_STATUS_INVALID_VALUE. The matrix inversion in cublas is possible for square matrices only so I assume that m == n in your case. This being said, the functions cublas<t>getrfBatched require pivot array to be of length n for each matrix, so you should allocate d_pivot_array as:

error = cudaMalloc((void **)&d_pivot_array, n * sizeof(int));

To be more generic, it is allocated as:

error = cudaMalloc((void **)&d_pivot_array, n * batchSize * sizeof(int));

Here is a square matrix inversion code which I wrote while testing the CUBLAS functions. The function input and output are float type square matrices allocated on device.

edited May 23 '17 at 12:06

Community

1
1

answered Mar 19 '14 at 10:19

sgarizvi

16,623
9
64
98

What exactly are spitch and dpitch in this case? Also, if I wanted it self contained, could I simply add a cudaMalloc src_d, and then cudaMemcpy from a to src_d, so that all that is passed in is my source matrix? – David Mar 19 '14 at 23:18
`spitch` and `dpitch` are the pitch of the matrix in case the matrices have been allocated using `cudaMallocPitch`. Else, it would just be equal to `n * sizeof(dataType)`. – sgarizvi Mar 20 '14 at 05:24
Yes, to make it self contained, you can create allocate the device matrices inside this function. – sgarizvi Mar 20 '14 at 05:25

Cublas Matrix LU Decomposition

1 Answers1