Can you use cublasDdot() to use blas operations in non-GPU memory?

Question

So I have a code that performs matrix multiplicaiton, but the problem is it returns just zeroes when I use the library -lcublas and the compiler nvcc; however, the code runs great with just a few tweaks to function names when I use the compiler, g++ with the library -lblas.

Can you use the -lcublas library to perform matrix multiplication from memory that is not on the GPU?

Here's the code that returns 0's:

extern "C" //external reference to function so the code compiles
{
    double cublasDdot(int *n, double *A, int *incA, double *B, int *incB);
}

//stuff happens

    cout << "Calculating/printing the contents of Matrix C for ddot...\n";
            C[i][t]=cublasDdot(&n, partA, &incA, partB, &incB); //This thing isn't working for some reason (although it compiles just fine)

I compile it by using this command: nvcc program -lcublas

This does work however:

extern "C" //external reference to function so the code compiles
{
    double ddot_(int *n, double *A, int *incA, double *B, int *incB);
}

//stuff happens

C[i][t]=ddot_(&n, partA, &incA, partB, &incB);

compiled with g++ program -lblas

I'm pretty confident now that the answer is "no." I'll post in answer later though after giving this a little more time. — Mechy, May 18 '13 at 20:49

score 1 · Accepted Answer · answered May 18 '13 at 21:16

1

cublas requires a properly functioning CUDA GPU.

Probably you are doing no error checking. Read up on how to do error checking in the cublas manual. And look at some error checking sample code.

The ordinary usage of cublas requires data to be transferred to the GPU and results to be transferred back.

answered May 18 '13 at 21:16

Robert Crovella

143,785
11
213
257

On the last sentence, it is possible to run CUBLAS on pinned, mapped host memory. It isn't fast, but it does work, – talonmies May 18 '13 at 22:11
That's true, although I could argue that even in that case the data is transferred to the GPU and the results are transferred back, it's just not explicitly done in host source code. It's also possible that the data may have been generated on the GPU, and the results may be consumed on the GPU by subsequent operations, in which case no explicit data transfers of any kind are required, at least for the cublas operation in question. – Robert Crovella May 18 '13 at 22:19

Can you use cublasDdot() to use blas operations in non-GPU memory?

1 Answers1