Questions tagged [cublas]

The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library for use with CUDA capable GPUs.

The CUBLAS library is an implementation of the standard BLAS (Basic Linear Algebra Subprograms) API on top of the NVIDIA CUDA runtime.

Since CUDA 4.0 was released, the library contains implementations of all 152 standard BLAS routines, supporting single precision real and complex arithmetic on all CUDA capable devices, and double precision real and complex arithmetic on those CUDA capable devices with double precision support. The library includes host API bindings for C and Fortran, and CUDA 5.0 introduces a device API for use with CUDA kernels.

The library is shipped in every version of the CUDA toolkit and has a dedicated homepage at http://developer.nvidia.com/cuda/cublas.

330 questions
7
votes
1 answer

Copying array of pointers into device memory and back (CUDA)

I am trying to use cublas function cublasSgemmBatched in my toy example. In this example I first allocate 2D arrays: h_AA, h_BB of the size [6][5] and h_CC of the size [6][1]. After that I copied it to the device, performed cublasSgemmBatched and…
Mikhail Genkin
  • 3,247
  • 4
  • 27
  • 47
7
votes
3 answers

BLAS and CUBLAS

I'm wondering about NVIDIA's cuBLAS Library. Does anybody have experience with it? For example if I write a C program using BLAS will I be able to replace the calls to BLAS with calls to cuBLAS? Or even better implement a mechanism which let's the…
Nils
  • 13,319
  • 19
  • 86
  • 108
7
votes
2 answers

Normal Cuda Vs CuBLAS?

Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own…
Fontaine007
  • 577
  • 2
  • 8
  • 18
7
votes
3 answers

How to normalize matrix columns in CUDA with max performance?

How to effectively normalize matrix columns in CUDA? My matrix is stored in column-major, and the typical size is 2000x200. The operation can be represented in the following matlab code. A = rand(2000,200); A = exp(A); A = A./repmat(sum(A,1),…
kangshiyin
  • 9,681
  • 1
  • 17
  • 29
6
votes
1 answer

cublasSgemm row-major multiplication

I'm trying to use cublasSgemm to multiplicity two non-square matrices that are stored in row-major order. I know that this function has one parameter where you can specify that if you want to transpose the matrices (CUBLAS_OP_T) but the result is…
Lane
  • 161
  • 2
  • 14
6
votes
1 answer

CUBLAS matrix multiplication

After implementing matrix multiplication with CUDA. I tried to implement it with CUBLAS(thanks to the advice of some people here in the forum). I can multiply square matrices but (yes once again...) I am having difficulties working with non square…
Bernardo
  • 531
  • 1
  • 13
  • 31
6
votes
1 answer

CUBLAS - is matrix-element exponentiation possible?

I'm using CUBLAS (Cuda Blas libraries) for matrix operations. Is possible to use CUBLAS to achieve the exponentiation/root mean square of a matrix items? I mean, having the 2x2 matrix 1 4 9 16 What I want is a function to elevate to a given value…
Marco A.
  • 43,032
  • 26
  • 132
  • 246
6
votes
3 answers

Failed to create CUBLAS handle. Tensorflow interaction with OpenCV

I am trying to use a PlayStation Eye Camera for a deep reinforcement learning project. The network, TensorFlow installation (0.11) and CUDA (8.0) are functional because I have been able to train the network on a simulation. Now when I am trying to…
RandomEngineer
  • 61
  • 1
  • 1
  • 2
6
votes
1 answer

Cudafy cannot find cublas, cudafft

Thanks for reading my thread. My Cudafy cannot load the cublas64_55.dll I am using Windows 7, VS2012, and CUDA5.5. My cublas64_55.dll, cufft64_35.dll and etc are all in C:\Program Files\NVIDIA GPU ComputingTookit\CUDA\v5.5\bin And my environment…
Ono
  • 1,357
  • 3
  • 16
  • 38
6
votes
2 answers

Should we reuse the cublasHandle_t across different calls?

I'm using the latest version CUDA 5.5 and the new CUBLAS has a stateful taste where every function needs a cublasHandle_t e.g. cublasHandle_t handle; cublasCreate_v2(&handle); cublasDgemm_v2(handle, A_trans, B_trans, m, n, k, &alpha, d_A, lda,…
SkyWalker
  • 13,729
  • 18
  • 91
  • 187
6
votes
2 answers

Element-by-element vector multiplication with CUDA

I have build a rudimentary kernel in CUDA to do an elementwise vector-vector multiplication of two complex vectors. The kernel code is inserted below (multiplyElementwise). It works fine, but since I noticed that other seemingly straightforward…
WVDB
  • 63
  • 1
  • 1
  • 3
6
votes
2 answers

Add scalar to vector in BLAS (cuBLAS/CUDA)

I don't know if I'm just overlooking something obvious but despite due googling around I see no way to simply add a scalar to a vector (or matrix) using BLAS operations. I'm trying to do this in cuBLAS/CUDA so I'll take any way to accomplish this…
Matt Phillips
  • 9,465
  • 8
  • 44
  • 75
6
votes
1 answer

Will the cublas kernel functions automatically be synchronized with the host?

Just a general question about cublas. For a single thread, if there is not memory transfer from GPU to CPU (e.g. cublasGetVector), will the cublas kernel functions (eg cublasDgemm) automatically be synchronized with the host? …
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
5
votes
3 answers

Doing multiple matrix-matrix multiplications in one operation

I'm implementing an algorithm that, in essence, is a series of matrix-matrix multiplications like this: Res = M1.M2.M3. ... .Mn My matrices are really small 100x100 floats, but the sequence is really long, in the order of billions. I tried using…
Martin Kristiansen
  • 9,875
  • 10
  • 51
  • 83
5
votes
2 answers

CUBLAS - matrix addition.. how?

I am trying to use CUBLAS to sum two big matrices of unknown size. I need a fully optimized code (if possible) so I chose not to rewrite the matrix addition code (simple) but using CUBLAS, in particular the cublasSgemm function which allows to sum A…
Marco A.
  • 43,032
  • 26
  • 132
  • 246
1
2
3
21 22