Questions tagged [blas]

The Basic Linear Algebra Subprograms are a standard set of interfaces for low-level vector and matrix operations commonly used in scientific computing.

A reference implementation is available at NetLib; optimized implementations are also available for all high-performance computing architectures, for example:

The BLAS routines are divided into three levels:

  • Level 1: vector operations e.g. vector addition, dot product
  • Level 2: matrix-vector operations e.g. matrix-vector multiplication
  • Level 3: matrix-matrix operations e.g. matrix multiplication
906 questions
6
votes
1 answer

CUBLAS matrix multiplication

After implementing matrix multiplication with CUDA. I tried to implement it with CUBLAS(thanks to the advice of some people here in the forum). I can multiply square matrices but (yes once again...) I am having difficulties working with non square…
Bernardo
  • 531
  • 1
  • 13
  • 31
6
votes
1 answer

How do I multiply a matrix with a vector in gonum?

I want to multiply a mat.Dense Matrix with a mat.VecDense Vector, but obviously mat.Dense nor mat.VecDens do not implement the Matrix interface or define a method to multiply a matrix with a vector. How would I do that?
user8725011
6
votes
0 answers

Why is dgemm and sgemm much slower (200x) than numpy's dot?

Why is dgemm and sgemm much slower (200x) than numpy's dot? Is it expected and normal? The following is the code I use to test: from scipy.linalg import blas import numpy as np import time x2 = np.zeros((1000000, 512)) x1 = np.zeros((1, 512)) t1…
user2675516
6
votes
0 answers

Fastest code for Hadamard product

Having two complex arrays of dimension 2 I want to calculate a point wise multiplication (Hadamard product): complex(8) :: A(N,N), B(N,N), C(N,N) ... do j = 1, N do i = 1, N C(i,j) = A(i,j)*B(i,j) enddo enddo Is there any BLAS routine to…
thyme
  • 388
  • 5
  • 18
6
votes
1 answer

Java best practices for vectorized computations

I'm researching methods for computing expensive vector operations in Java, e.g. dot-products or multiplications between large matrices. There are a few good threads on here on this topic, like this and this. It appears that there is no reliable way…
blackgreen
  • 34,072
  • 23
  • 111
  • 129
6
votes
2 answers

iOS 4 Accelerate Cblas with 4x4 matrices

I’ve been looking into the Accelerate framework that was made available in iOS 4. Specifically, I made some attempts to use the Cblas routines in my linear algebra library in C. Now I can’t get the use of these functions to give me any performance…
6
votes
1 answer

'Symbol lookup error' with netlib-java

Background & Problem I am having a bit of trouble running the examples in Spark's MLLib on a machine running Fedora 23. I have built Spark 1.6.2 with the following options per Spark documentation: build/mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.4 \ …
Addison
  • 131
  • 1
  • 9
6
votes
3 answers

Without root access, run R with tuned BLAS when it is linked with reference BLAS

Can any one tell me why I can not successfully test OpenBLAS's dgemm performance (in GFLOPs) in R via the following way? link R with the "reference BLAS" libblas.so compile my C program mmperf.c with OpenBLAS library libopenblas.so load the…
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
6
votes
0 answers

Why can R be linked to a shared BLAS later even if it was built with `--with-blas = lblas`?

The BLAS section in R installation and administration manual says that when R is built from source, with configuration parameter --without-blas, it will build Netlib's reference BLAS into a standalone shared library at R_HOME/lib/libRblas.so, along…
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
6
votes
1 answer

How much faster is Eigen for small fixed size matrices?

I'm using Julia at the moment but I have a performance critical function which requires an enormous amount of repeated matrix operations on small fixed size matrices (3 dimensional or 4 dimensional). It seems that all the matrix operations in Julia…
Lindon
  • 1,292
  • 1
  • 10
  • 21
6
votes
1 answer

Multiplying three matrices in BLAS with the middle one being diagonal

A is an MxK matrix, B is a vector of size K, and C is a KxN matrix. What set of BLAS operators should I use to compute the matrix below? M = A*diag(B)*C One way to implement this would be using three for loops like below for (int i=0; i
D R
  • 21,936
  • 38
  • 112
  • 149
6
votes
2 answers

Bignum, Linear Algebra and Digital Signal Processing on iPhone OS (iOS 4)

I think I've found some gems in the iPhone OS (iOS 4). I found that there're 128-bit, 256-bit, 512-bit and 1024-bit integer data types, provided by the Accelerate Framework. There're also Apple's implementation of Basic Linear Algebra Subprograms…
6
votes
0 answers

blas/lapack/atlas in numpy on fedora

I've compiled and installed numpy successfully. But when I enter the import numpy.distutils.system_info as sysinfo sysinfo.get_info('atlas') command, all I get is: lapack_info: NOT AVAILABLE lapack_opt_info: NOT…
egievs
  • 71
  • 4
6
votes
1 answer

What memory access patterns are most efficient for outer-product-type double loops?

What access patterns are most efficient for writing cache-efficient outer-product type code that maximally exploits data data locality? Consider a block of code for processing all pairs of elements of two arrays such as: for (int i = 0; i < N; i++) …
Robert T. McGibbon
  • 5,075
  • 3
  • 37
  • 45
6
votes
4 answers

Armadillo+OpenBLAS slower than MATLAB?

New to SO. I am test-driving Armadillo+OpenBLAS, and a simple Monte-Carlo geometric Brownian motion logic shows much longer runtime than MATLAB. I believe something must be wrong. Environment: Intel i-5 4 core, 8GB ram, VS 2012 Express, Armadillo…
AndreasBVB
  • 131
  • 1
  • 6