Questions tagged [blas]

The Basic Linear Algebra Subprograms are a standard set of interfaces for low-level vector and matrix operations commonly used in scientific computing.

A reference implementation is available at NetLib; optimized implementations are also available for all high-performance computing architectures, for example:

The BLAS routines are divided into three levels:

  • Level 1: vector operations e.g. vector addition, dot product
  • Level 2: matrix-vector operations e.g. matrix-vector multiplication
  • Level 3: matrix-matrix operations e.g. matrix multiplication
906 questions
8
votes
1 answer

Is GEMM or BLAS used in Tensorflow, Theano, Pytorch

I know that Caffe uses GEneral Matrix to Matrix Multiplication (GEMM) which is part of Basic Linear Algebra Subprograms (BLAS) library for performing convolution operations. Where a convolution is converted to matrix multiplication operation. I have…
Gaurav Srivastava
  • 505
  • 1
  • 7
  • 17
8
votes
4 answers

LAPACK/BLAS versus simple "for" loops

I want to migrate a piece of code that involves a number of vector and matrix calculations to C or C++, the objective being to speed up the code as much as possible. Are linear algebra calculations with for loops in C code as fast as calculations…
behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
8
votes
3 answers

BLAS matrix by matrix transpose multiply

I have to calculate some products in the form A'A or more general A'DA, where A is a general mxn matrix and D is a diagonal mxm matrix. Both of them are full rank; i.e.rank(A)=min(m,n). I know that you can save a substantial time is such symmetric…
enanone
  • 923
  • 11
  • 25
8
votes
0 answers

Do numpy or scipy implement sub-cubic multiplication

I've searched quite a bit, but I've only found homegrown reimplementations of Strassen matrix multiplication. Wikipedia says that numpy uses BLAS (which includes a high-performance implementations of sub-cubic matrix multiplication algorithms, e.g.…
user
  • 7,123
  • 7
  • 48
  • 90
8
votes
1 answer

How to perform Vector-Matrix Multiplication with BLAS ?

BLAS defines the GEMV (Matrix-Vector Multiplication) level-2 operation. How to use a BLAS Library to perform Vector-Matrix Multiplication ? It's probably obvious, but I don't see how to use BLAS operation for this multiplication. I would have…
Baptiste Wicht
  • 7,472
  • 7
  • 45
  • 110
8
votes
1 answer

OpenBLAS routine used from R/Rcpp runs only on a single core in linux

I am trying to run a QR decomposition (LAPACKE_dgeqrf) in R on a linux machine (CentOS) using a C++ program that is interfaced with Rcpp. Unfortunately, I see only 100% using top. This also happens on a Red Hat Enterprise Linux Server. However, the…
chris
  • 461
  • 2
  • 10
8
votes
1 answer

dgemm segfaulting with large F-order matrices in scipy

I'm attempting to compute A*A.T in Python using SciPy's dgemm, but getting a segfault when A has large row dimension (~50,000) and I pass the matrices in in F-order. Of course, the resulting matrix is very large, but both sgemm and passing to dgemm…
8
votes
2 answers

R loop getting slower and slower

I am struggling to understand why this bit of code (adapted from the R Benchmark 2.5) becomes slower and slower (on average) as the number of iteration increases. require(Matrix) c <- 0; for (i in 1:100) { a <- new("dgeMatrix", x = rnorm(3250 *…
RenéR
  • 204
  • 2
  • 7
8
votes
5 answers

Numpy and Scipy installation on windows

I have installed Numpy successfully. But on the site , there is lot of things that I have to do such as building Numpy, Scipy, downloading ATLAS, LAPACK etc. I am really confused and even I have checked some of the other queries also. Still not able…
Hemant
  • 619
  • 2
  • 6
  • 17
8
votes
1 answer

Are BLAS Level 1 procedures still relevant for modern fortran compilers?

Most of the BLAS Level 1 API can be trivially written straight forward using Fortran 9x+ vectorized assignments and intrinsic procedures. Assuming you are using a modern optimizing compiler, like Intel Fortran, and correct target-specific compiler…
abbot
  • 27,408
  • 6
  • 54
  • 57
8
votes
2 answers

How to accelerate matrix multiplications in Python?

I am developing a small neural network whose parameters need a lot of optimization, so a lot of processing time. I have profiled my script with cProfile and what takes 80% of the processor time is the NumPy dot function, the rest is matrix inversion…
PierreE
  • 675
  • 1
  • 11
  • 23
7
votes
1 answer

Difference between dtrtrs and dtrsm

I am looking for some triangular solvers, and I have come across two solvers. One in BLAS: dtrsm and another in LAPACK: dtrtrs. From the looks of it both seem to have common functionality, with dtrsm having a little bit more functionality (scaling…
Pavan Yalamanchili
  • 12,021
  • 2
  • 35
  • 55
7
votes
1 answer

In R how to control multi-threading in BLAS parallel matrix product

I have a question regarding the use of BLAS parallelized matrix product in R (being the default matrix product at least since R-3.4, maybe earlier). The default behavior (at least on my machine) is now for the matrix product (c.f. example below) to…
Odin
  • 633
  • 4
  • 11
7
votes
3 answers

How can I make use of intel-mkl with tensorflow

I've seen a lot of documentation about making using of a CPU with tensorflow, however, I don't have a GPU. What I do have is a fairly capable CPU and a holing 5GB of intel math kernel, which, I hope, might help me speed up tensorflow a fair…
George H
  • 412
  • 3
  • 6
  • 12
7
votes
1 answer

Spark netlib-java BLAS

i am trying to troubleshoot my non-working apache spark and netlib setup and i don't know what to do next. Here some info: Spark 1.3.1 (but also tried 1.5.1) Mesos Cluster with 3 Nodes Ubuntu Trusty on every node and installed following BLAS…
wobu
  • 196
  • 1
  • 9