Normal Cuda Vs CuBLAS?

Question

Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own Cuda program for the matrix computations?

Is it similar to the relationship between normal C code and the BLAS library on CPU, which does the compiler level optimization? But GPU is intrinsically multi-threaded, so the situation may not quite like those on CPU. Say a matrix addition. — Fontaine007, Sep 14 '14 at 17:35

score 32 · Accepted Answer · answered Sep 21 '14 at 01:53

32

We highly recommend developers use cuBLAS (or cuFFT, cuRAND, cuSPARSE, thrust, NPP) when suitable for many reasons:

We validate correctness across every supported hardware platform, including those which we know are coming up but which maybe haven't been released yet. For complex routines, it is entirely possible to have bugs which show up on one architecture (or even one chip) but not on others. This can even happen with changes to the compiler, the runtime, etc.
We test our libraries for performance regressions across the same wide range of platforms.
We can fix bugs in our code if you find them. Hard for us to do this with your code :)
We are always looking for which reusable and useful bits of functionality can be pulled into a library - this saves you a ton of development time, and makes your code easier to read by coding to a higher level API.

Honestly, at this point, I can probably count on one hand the number of developers out there who actually implement their own dense linear algebra routines rather than calling cuBLAS. It's a good exercise when you're learning CUDA, but for production code it's usually best to use a library.

(Disclosure: I run the CUDA Library team)

answered Sep 21 '14 at 01:53

Jonathan Cohen

541
4
4

26

Then please provide the source code or else it is very time-consuming if something goes wrong due to a bug in cuBLAS – psihodelia Aug 24 '15 at 14:50
2

Also, cuBLAS functions are no longer launchanble from kernels (starting from cuda 10.0)? It's a lot less useful then. [link to forum post](https://devtalk.nvidia.com/default/topic/1046849/cuda-programming-and-performance/cublas-call-from-kernel-in-cuda-10-0/) – Kari May 10 '19 at 08:12
According to a talk on GTC Spring 2021 https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31754/ (free access, but account registration required), cuBLASDx, a cuBLAS version which can be fused with custom kernels is upcoming in the future. There are already respective releases for cuFFT und cuSOLVER. – Sebastian Jul 03 '21 at 09:11
What are you afraid about ? Open source the code and keep the copyright. That's what would move the usage forward. Any serious competitor can easily reverse engineer it anyway, so why not make the proper step and provide it open and not closed. – John Apr 20 '23 at 21:06
@John that's not open source, that's source-available which is nowhere near the same thing – somebody Apr 28 '23 at 08:17

Jez · Answer 2 · 2014-09-14T18:38:44.120

There's several reasons you'd chose to use a library instead of writing your own implementation. Three, off the top of my head:

You don't have to write it. Why do work when somebody else has done it for you?
It will be optimised. NVIDIA supported libraries such as cuBLAS are likely to be optimised for all current GPU generations, and later releases will be optimised for later generations. While most BLAS operations may seem fairly simple to implement, to get peak performance you have to optimise for hardware (this is not unique to GPUs). A simple implementation of SGEMM, for example, may be many times slower than an optimised version.
They tend to work. There's probably less chance you'll run up against a bug in a library then you'll create a bug in your own implementation which bites you when you change some parameter or other in the future.

The above isn't just relevent to cuBLAS: if you have a method that's in a well supported library you'll probably save a lot of time and gain a lot of performance using it relative to using your own implementation.

+1, but as for point 3, hunting down bugs in libraries can turn out pretty ugly and it happens... occasionally. Also may be worth mentioning that in case of somehow specific problems you can always write some "custom tailored" code which will perform better in your case. — Michal Hosala, Sep 14 '14 at 18:09

Normal Cuda Vs CuBLAS?

2 Answers2