Questions tagged [blas]

The Basic Linear Algebra Subprograms are a standard set of interfaces for low-level vector and matrix operations commonly used in scientific computing.

A reference implementation is available at NetLib; optimized implementations are also available for all high-performance computing architectures, for example:

The BLAS routines are divided into three levels:

Level 1: vector operations e.g. vector addition, dot product
Level 2: matrix-vector operations e.g. matrix-vector multiplication
Level 3: matrix-matrix operations e.g. matrix multiplication

906 questions

votes

1 answer

CUBLAS matrix multiplication

After implementing matrix multiplication with CUDA. I tried to implement it with CUBLAS(thanks to the advice of some people here in the forum). I can multiply square matrices but (yes once again...) I am having difficulties working with non square…

asked Apr 05 '11 at 11:23

Bernardo

votes

1 answer

How do I multiply a matrix with a vector in gonum?

I want to multiply a mat.Dense Matrix with a mat.VecDense Vector, but obviously mat.Dense nor mat.VecDens do not implement the Matrix interface or define a method to multiply a matrix with a vector. How would I do that?

go math matrix blas gonum

asked Oct 12 '18 at 10:31

user8725011

votes

0 answers

Why is dgemm and sgemm much slower (200x) than numpy's dot?

Why is dgemm and sgemm much slower (200x) than numpy's dot? Is it expected and normal? The following is the code I use to test: from scipy.linalg import blas import numpy as np import time x2 = np.zeros((1000000, 512)) x1 = np.zeros((1, 512)) t1…

python numpy scipy blas

asked Jun 04 '18 at 04:00

user2675516

votes

0 answers

Fastest code for Hadamard product

Having two complex arrays of dimension 2 I want to calculate a point wise multiplication (Hadamard product): complex(8) :: A(N,N), B(N,N), C(N,N) ... do j = 1, N do i = 1, N C(i,j) = A(i,j)*B(i,j) enddo enddo Is there any BLAS routine to…

optimization matrix-multiplication lapack blas

asked Mar 07 '17 at 12:46

thyme

votes

1 answer

Java best practices for vectorized computations

I'm researching methods for computing expensive vector operations in Java, e.g. dot-products or multiplications between large matrices. There are a few good threads on here on this topic, like this and this. It appears that there is no reliable way…

java blas nd4j

asked Dec 27 '16 at 17:05

blackgreen

34,072
23
111
129

votes

2 answers

iOS 4 Accelerate Cblas with 4x4 matrices

I’ve been looking into the Accelerate framework that was made available in iOS 4. Specifically, I made some attempts to use the Cblas routines in my linear algebra library in C. Now I can’t get the use of these functions to give me any performance…

iphone ios blas

asked Oct 16 '10 at 19:19

Bastiaan M. van de Weerd

votes

1 answer

'Symbol lookup error' with netlib-java

Background & Problem I am having a bit of trouble running the examples in Spark's MLLib on a machine running Fedora 23. I have built Spark 1.6.2 with the following options per Spark documentation: build/mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.4 \ …

java apache-spark java-native-interface fedora blas

asked Jun 30 '16 at 21:54

Addison

votes

3 answers

Without root access, run R with tuned BLAS when it is linked with reference BLAS

Can any one tell me why I can not successfully test OpenBLAS's dgemm performance (in GFLOPs) in R via the following way? link R with the "reference BLAS" libblas.so compile my C program mmperf.c with OpenBLAS library libopenblas.so load the…

r linux shared-libraries ld blas

asked May 05 '16 at 03:27

Zheyuan Li

71,365
17
180
248

votes

0 answers

Why can R be linked to a shared BLAS later even if it was built with `--with-blas = lblas`?

The BLAS section in R installation and administration manual says that when R is built from source, with configuration parameter --without-blas, it will build Netlib's reference BLAS into a standalone shared library at R_HOME/lib/libRblas.so, along…

r ubuntu blas openblas

asked May 03 '16 at 05:13

Zheyuan Li

71,365
17
180
248

votes

1 answer

How much faster is Eigen for small fixed size matrices?

I'm using Julia at the moment but I have a performance critical function which requires an enormous amount of repeated matrix operations on small fixed size matrices (3 dimensional or 4 dimensional). It seems that all the matrix operations in Julia…

julia eigen blas eigen3

asked Feb 21 '16 at 22:10

Lindon

1,292
1
10
21

votes

1 answer

Multiplying three matrices in BLAS with the middle one being diagonal

A is an MxK matrix, B is a vector of size K, and C is a KxN matrix. What set of BLAS operators should I use to compute the matrix below? M = A*diag(B)*C One way to implement this would be using three for loops like below for (int i=0; i

c matrix-multiplication blas

asked Aug 23 '10 at 13:37

D R

21,936
38
112
149

votes

2 answers

Bignum, Linear Algebra and Digital Signal Processing on iPhone OS (iOS 4)

I think I've found some gems in the iPhone OS (iOS 4). I found that there're 128-bit, 256-bit, 512-bit and 1024-bit integer data types, provided by the Accelerate Framework. There're also Apple's implementation of Basic Linear Algebra Subprograms…

iphone objective-c bignum lapack blas

asked Aug 09 '10 at 03:50

Siu Ching Pong -Asuka Kenji-

7,943
8
53
74

votes

0 answers

blas/lapack/atlas in numpy on fedora

I've compiled and installed numpy successfully. But when I enter the import numpy.distutils.system_info as sysinfo sysinfo.get_info('atlas') command, all I get is: lapack_info: NOT AVAILABLE lapack_opt_info: NOT…

python numpy fedora blas atlas

asked Jul 09 '15 at 11:26

egievs

votes

1 answer

What memory access patterns are most efficient for outer-product-type double loops?

What access patterns are most efficient for writing cache-efficient outer-product type code that maximally exploits data data locality? Consider a block of code for processing all pairs of elements of two arrays such as: for (int i = 0; i < N; i++) …

c performance memory blas

asked Jul 05 '14 at 02:18

Robert T. McGibbon

5,075
3
37
45

votes

4 answers

Armadillo+OpenBLAS slower than MATLAB?

New to SO. I am test-driving Armadillo+OpenBLAS, and a simple Monte-Carlo geometric Brownian motion logic shows much longer runtime than MATLAB. I believe something must be wrong. Environment: Intel i-5 4 core, 8GB ram, VS 2012 Express, Armadillo…

c++ blas armadillo

asked Apr 25 '14 at 18:06

AndreasBVB

Prev 1 2 3

…

60 61 Next