6

I'm looking for a free/open source C/C++ (either is acceptable) library of vectorized versions of common math functions (such as ln or exp) similar to Intel's Vector Math Library for Linux. I'd like a library that would provide me with the ability to write something like:

double a[ARRAY_SIZE], b[ARRAY_SIZE];
for (int i = 0; i < ARRAY_SIZE; ++i) {
    a[i] = ln(b[i]);
}

as:

double a[ARRAY_SIZE], b[ARRAY_SIZE];
vectorized_ln(a, b, ARRAY_SIZE);

and have it use the full power of the SIMD instructions available on the Intel and AMD architectures. The development environment consists of GNU tools running on Linux. Intel's Math Kernel Library contains something called Vector Math Library which advertises "vector implementations of computationally intensive core mathematical functions" including basic functions, trig functions, etc, so I'm looking for something like that but for free.

BD at Rivenhill
  • 12,395
  • 10
  • 46
  • 49

4 Answers4

8

I developed an open-source (BSD) Yeppp! mathematical library, which provides some vector elementary functions (log, exp, sin, cos, tan), and is competitive with MKL in performance. Here is an example of using vector logarithm function from Yeppp!

Marat Dukhan
  • 11,993
  • 4
  • 27
  • 41
6

Felix von Leitner has written an extensive presentation on the actual assembly produced by various c compilers.

His notes on vectorization of simple operations start on slide 28.

  • For GCC 4.4 and a memset type loop

    • gcc -O2 generates a loop that writes one byte at a time
    • gcc -O3 vectorizes, writes 32-bit (x86) or 128-bit (x86 wit h SSE or x64) at a time
    • impressive: the vectorized code checks and fixes the alignment first

Slide 41 is entitled "Outsmarting the Compiler - simd-shift" and concludes that "gcc is smarter than the video codec programmer on all platforms"

Slide 42 is another case where gcc will automatically vectorize naive code.

All of which adds up to check first to see if the compiler you are using will simply deal with it for you.

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
  • I already intend to write several versions of this program using various approaches, including OpenMP, loops written to be amenable to autovectorization, and possibly intrinsics. Because the functions that I'm working on include exp() and log(), I'm looking for a library that has versions of these functions already vectorized so that I can embed them in my program to see if there is a performance improvement. +1 for the link to the survey though. – BD at Rivenhill Aug 03 '11 at 09:34
  • Fair enough, but I was surprised by some of the transformation and optimizations that are already supported and thought that others might be in the same boat. – dmckee --- ex-moderator kitten Aug 03 '11 at 15:33
4

you might find AMD's LibM Library (it is for x64 however) combined with SSEPlus to be of use. There is also an opensource x86 variant of Sony's Vector Math library.

Necrolis
  • 25,836
  • 3
  • 63
  • 101
  • I will definitely see how libM/SSEPlus might be used for this problem since AMD is one of our target architectures. Looks like the vector math library has been embedded in the Bullet library for games physics (currently hosted at http://code.google.com/p/bullet/). A quick look at the class hierarchy indicates that it is probably too high level for my purpose, but I will did deeper into it to be sure, I saw some other interesting stuff in there anyway. – BD at Rivenhill Aug 03 '11 at 10:10
  • @dbliss: Fixed, but annoying that they can't permalink this stuff... – Necrolis Nov 30 '15 at 20:30
3

Besides writing these functions yourself (which isn't that much rocket science) or using Ignacio's link..

It might be that Intel's SPMD compiler is something for you: http://ispc.github.com/

It's a C-style compiler in which you write stuff in serial/scalar fashion and it will parallelize them with a certain target architecture in mind. The resulting functions are easy to call from your regular CPP project.

I quote: "ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs; it frequently provides a 3x or more speedup on CPUs with 4-wide SSE units, without any of the difficulty of writing intrinsics code."

I yet have to try it myself but it looks good for generic calc. parallelization.

nielsj
  • 1,499
  • 12
  • 24
  • As per the Oil Runtime Compiler above, I hadn't considered writing in a separate language for linking against my code, but I am now. – BD at Rivenhill Aug 03 '11 at 10:12
  • As long as you're not retrofitting in any way I can imagine it can be quite elegant to have a set of miniprograms (not unlike graphics shaders) that handle heavy-duty mathematical tasks. And ISPC has different backends (free multiplatform), and likely there will be more in the future. – nielsj Aug 03 '11 at 15:09