Questions tagged [cublas]

The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library for use with CUDA capable GPUs.

The CUBLAS library is an implementation of the standard BLAS (Basic Linear Algebra Subprograms) API on top of the NVIDIA CUDA runtime.

Since CUDA 4.0 was released, the library contains implementations of all 152 standard BLAS routines, supporting single precision real and complex arithmetic on all CUDA capable devices, and double precision real and complex arithmetic on those CUDA capable devices with double precision support. The library includes host API bindings for C and Fortran, and CUDA 5.0 introduces a device API for use with CUDA kernels.

The library is shipped in every version of the CUDA toolkit and has a dedicated homepage at http://developer.nvidia.com/cuda/cublas.

330 questions

votes

1 answer

Copying array of pointers into device memory and back (CUDA)

I am trying to use cublas function cublasSgemmBatched in my toy example. In this example I first allocate 2D arrays: h_AA, h_BB of the size [6][5] and h_CC of the size [6][1]. After that I copied it to the device, performed cublasSgemmBatched and…

asked Jan 13 '15 at 21:14

Mikhail Genkin

3,247
4
27
47

votes

3 answers

BLAS and CUBLAS

I'm wondering about NVIDIA's cuBLAS Library. Does anybody have experience with it? For example if I write a C program using BLAS will I be able to replace the calls to BLAS with calls to cuBLAS? Or even better implement a mechanism which let's the…

boost cuda blas cublas

asked Apr 30 '10 at 08:22

Nils

13,319
19
86
108

votes

2 answers

Normal Cuda Vs CuBLAS?

Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own…

cuda cublas

asked Sep 14 '14 at 17:30

Fontaine007

votes

3 answers

How to normalize matrix columns in CUDA with max performance?

How to effectively normalize matrix columns in CUDA? My matrix is stored in column-major, and the typical size is 2000x200. The operation can be represented in the following matlab code. A = rand(2000,200); A = exp(A); A = A./repmat(sum(A,1),…

performance matrix cuda thrust cublas

asked Jan 08 '13 at 08:35

kangshiyin

9,681
1
17
29

votes

1 answer

cublasSgemm row-major multiplication

I'm trying to use cublasSgemm to multiplicity two non-square matrices that are stored in row-major order. I know that this function has one parameter where you can specify that if you want to transpose the matrices (CUBLAS_OP_T) but the result is…

matrix cuda cublas

asked May 08 '19 at 14:54

Lane

votes

1 answer

CUBLAS matrix multiplication

After implementing matrix multiplication with CUDA. I tried to implement it with CUBLAS(thanks to the advice of some people here in the forum). I can multiply square matrices but (yes once again...) I am having difficulties working with non square…

cuda matrix-multiplication blas cublas

asked Apr 05 '11 at 11:23

Bernardo

votes

1 answer

CUBLAS - is matrix-element exponentiation possible?

I'm using CUBLAS (Cuda Blas libraries) for matrix operations. Is possible to use CUBLAS to achieve the exponentiation/root mean square of a matrix items? I mean, having the 2x2 matrix 1 4 9 16 What I want is a function to elevate to a given value…

matrix cuda cublas

asked Mar 27 '11 at 15:03

Marco A.

43,032
26
132
246

votes

3 answers

Failed to create CUBLAS handle. Tensorflow interaction with OpenCV

I am trying to use a PlayStation Eye Camera for a deep reinforcement learning project. The network, TensorFlow installation (0.11) and CUDA (8.0) are functional because I have been able to train the network on a simulation. Now when I am trying to…

python opencv tensorflow cublas

asked Feb 27 '17 at 14:45

RandomEngineer

votes

1 answer

Cudafy cannot find cublas, cudafft

Thanks for reading my thread. My Cudafy cannot load the cublas64_55.dll I am using Windows 7, VS2012, and CUDA5.5. My cublas64_55.dll, cufft64_35.dll and etc are all in C:\Program Files\NVIDIA GPU ComputingTookit\CUDA\v5.5\bin And my environment…

visual-studio cuda environment-variables cublas cudafy.net

asked Mar 13 '14 at 20:47

Ono

1,357
3
16
38

votes

2 answers

Should we reuse the cublasHandle_t across different calls?

I'm using the latest version CUDA 5.5 and the new CUBLAS has a stateful taste where every function needs a cublasHandle_t e.g. cublasHandle_t handle; cublasCreate_v2(&handle); cublasDgemm_v2(handle, A_trans, B_trans, m, n, k, &alpha, d_A, lda,…

cuda cublas

asked Jan 08 '14 at 15:11

SkyWalker

13,729
18
91
187

votes

2 answers

Element-by-element vector multiplication with CUDA

I have build a rudimentary kernel in CUDA to do an elementwise vector-vector multiplication of two complex vectors. The kernel code is inserted below (multiplyElementwise). It works fine, but since I noticed that other seemingly straightforward…

cuda complex-numbers cublas

asked Jun 03 '13 at 14:33

WVDB

votes

2 answers

Add scalar to vector in BLAS (cuBLAS/CUDA)

I don't know if I'm just overlooking something obvious but despite due googling around I see no way to simply add a scalar to a vector (or matrix) using BLAS operations. I'm trying to do this in cuBLAS/CUDA so I'll take any way to accomplish this…

c cuda addition blas cublas

asked Dec 27 '12 at 07:31

Matt Phillips

9,465
8
44
75

votes

1 answer

Will the cublas kernel functions automatically be synchronized with the host?

Just a general question about cublas. For a single thread, if there is not memory transfer from GPU to CPU (e.g. cublasGetVector), will the cublas kernel functions (eg cublasDgemm) automatically be synchronized with the host? …

cublas

asked Dec 02 '12 at 08:32

Hailiang Zhang

17,604
23
71
117

votes

3 answers

Doing multiple matrix-matrix multiplications in one operation

I'm implementing an algorithm that, in essence, is a series of matrix-matrix multiplications like this: Res = M1.M2.M3. ... .Mn My matrices are really small 100x100 floats, but the sequence is really long, in the order of billions. I tried using…

c++ c cuda blas cublas

asked Feb 09 '12 at 18:21

Martin Kristiansen

9,875
10
51
83

votes

2 answers

CUBLAS - matrix addition.. how?

I am trying to use CUBLAS to sum two big matrices of unknown size. I need a fully optimized code (if possible) so I chose not to rewrite the matrix addition code (simple) but using CUBLAS, in particular the cublasSgemm function which allows to sum A…

c++ matrix cuda blas cublas

asked Mar 25 '11 at 21:13

Marco A.

43,032
26
132
246

Prev 1

…

21 22 Next