CPU instruction sets for linear algebra?

Question

I'm in a situation where I have to perform some linear algebra calculations with a matrix that almost never changes and a lot of small vectors ( very very few 3x3 or 4x4 matrices and vectors with 3 values ) in C++, I was thinking about using some CPU instructions set for x86 32 bit, x86 64 bit, ARMv5 and above to speed up things and simplify the design of my math operations.

Surprisingly I haven't found a real set for linear algebra, most of them are for floating point math, cached, optimized as you want, but nothing really for matrices and linear algebra, is that just me or there is no set for linear algebra ?

The new FMA3 from AMD looks interesting to start with, but it's still really too rare to find in modern CPUs, I would like to stick to something as popular as the SSE on the x86 or the ARMv5 on ARM.

So there is a popular instruction set for small and quick linear algebra computations ? I could even accept a good amount of errors if the speed is good enough.

EDIT:

I should also note that in practice my compilers are:

gcc
mingw
Visual Studio

so I would like to have an open source product and a portable library on both x86 and ARM.

EDIT 2: Eigen doesn't support multithreaded execution, it's a big down for me.

How about using a library that already did the work for you, like Eigen? Vector instructions are very low level, they don't do linear algebra, they do basic operations on top of which you can build many things, including linear algebra. — Marc Glisse, Jun 29 '13 at 15:58
@MarcGlisse I was considering that, but I haven't found nothing that looks like what I need in terms of simplicity, for example there are a lot of benchmarks with a bozillion of huge matrices but I just need a very fast computation done with really small data structures. I was also considering a wrapper like Boost uBlas that can be linked to other libs like Intel MKL, but eigen looks like it's more optimized for both ARM and X86. I have to do use this in some imaging related program, so I can even afford a significant error threshold if the operation is fast. — user2485710, Jun 29 '13 at 16:10
It doesn't look like you have more than a superficial understanding what a CPU instruction set is. In particular, they aren't designed to "simplify the design of my math operations" or be "an open source product" or "portable library on both x86 and ARM". x86 and ARM are different instruction set architectures, so by definition the instruction set is not portable between them. Instruction sets provide small building blocks for writing algorithms to solve problems, they aren't made for solving non-trivial problems. — Ben Voigt, Jun 29 '13 at 17:05
Ironically, multiplication of 4 element vectors is the kind of operation you do find in SIMD instruction sets, and nothing for handling a "bozillion of huge matrices" — Ben Voigt, Jun 29 '13 at 17:08
@BenVoigt you feed a value you get a result according to the instruction used, you don't need to design something special or care about how it's implemented, than if you'll switch between ARM and x86 you just change the instruction name for that particular SIMD operation and you don't even have to change your method names. That's what I mean for "simplicity", I know that it's not portable, but it's a simple solution. If you will provide a solution for a C++ program that needs a simple and fast linear algebra I will be happy to see that. — user2485710, Jun 29 '13 at 17:14
@user2485710: "Change the instruction name" is what you usually do when switching between different assemblers for the same architecture. In different instruction sets, there are different instructions, different semantics, not merely different names. — Ben Voigt, Jun 29 '13 at 17:24
@BenVoigt yes but I don't have to port this on 20 different architectures, I also don't need a math library that performs any kind of possible operation on matrices and vectors, this is doable for me and it's also a much better option over rewriting and porting a building system to another platform for a given library that usually it's a much more complex and time consuming thing. If you want to suggest something feel free to do that, otherwise we are going nowhere. — user2485710, Jun 29 '13 at 17:54
@user2485710: So you don't want a full BLAS, and you don't want an instruction set, even though those are the things your question talks about. You want a basic library for matrix transformation of vectors -- which happens to be very commonly found in 3D graphics code and imaging. So look for a graphics library, not a math library. (After all, you're doing image processing) — Ben Voigt, Jun 29 '13 at 20:34

score 3 · Answer 1 · answered Jun 29 '13 at 15:26

3

May be you already know about this, but for x86 architecture I can recommend you Intel BLAS over AVX or AVX2.For details look here: http://software.intel.com/en-us/articles/optimize-for-intel-avx-using-intel-math-kernel-librarys-basic-linear-algebra-subprograms-blas-with-dgemm-routine or here http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-blas-cblas-and-lapack-compilinglinking-functions-fortran-and-cc-calls

answered Jun 29 '13 at 15:26

Oleksandr Karaberov

12,573
10
43
70

the problem is, it's not portable and it's not really C++ oriented – user2485710 Jun 29 '13 at 16:27

score 1 · Answer 2 · answered Jun 29 '13 at 20:41

You're not actually looking for a full linear algebra library, but just portable vector operations.

Searching for "portable C++ SIMD" generates plenty of relevant hits. One of the most promising is

Vc: portable, zero-overhead SIMD library for C++

Vc is a free software library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

CPU instruction sets for linear algebra?

2 Answers2