1

I am looking at parallel processing algorithm for processing speed improvement. I want to test Agner Fog's vector class library, VCL.

I am wondering how to select different vector classes for example Vec16c (SSE2 instruction set) and Vec32c (AVX instruction set).

I am using Intel® Atom™ x5-Z8350 Processor and according to the specs, it supports SSE4.2 instruction sets.

How can I effectively choose vector class with regards to the hardware support? Say for my processor, can I use Vec32c recommended for AVX instruction set?

A Fog
  • 4,360
  • 1
  • 30
  • 32
batuman
  • 7,066
  • 26
  • 107
  • 229
  • I have added a new tag: vector-class-library for questions related to the vector class library. – A Fog Jun 12 '20 at 11:56

3 Answers3

5

You can use compiler defined macros to detect what instruction-sets are enabled for the target you're compiling for, such as:

// Assume SSE2 as a baseline
#include  <vectori128.h>

#if defined(__AVX2__)
#include  <vectori256.h>
using vector_type = Vec32c;
#else
// Vec16c uses whatever is enabled, so you don't have to check for SSE4 yourself
using vector_type = Vec16c;
#endif

This doesn't do run-time detection, so only enable AVX2 if you want to make a binary that only runs on CPUs with AVX2.

If you want your code to work on non-x86 platforms, or x86 without SSE2 where VCL isn't supported at all, you need to protect the #include <vectori128.h> with #if as well.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jonas
  • 6,915
  • 8
  • 35
  • 53
  • 1
    You'd actually want to use Vec32c if `__AVX2__` is defined, and otherwise always use `Vec16c` (and let vectorclass headers take care of using SSE4.1/4.2 / SSSE3 where useful.) The only other thing you'd want to do with macros in your own code is check for AVX512 and use `Vec64c`. One of the major goals of the vectorclass library is abstracting the selection of different intrinsics based on availability of different target options. – Peter Cordes Nov 24 '16 at 10:41
  • This took a pretty major edit before I could upvote it, but I'm pretty confident it's correct now. I have actually used VCL (and [contributed changes](https://github.com/pcordes/vectorclass) (which AFAIK aren't integrated yet, and I should probably polish up so Agner can include them.)) – Peter Cordes Nov 24 '16 at 10:55
4

AVX is required for 32-byte vectors. (And AVX2 for 32B integer vectors like Vec32c). Since your Atom doesn't have AVX, don't include Agner's vectorclassi256.h or vectorclassf256.h, just the 128 headers.

Compile with -march=native to get the compiler to enable all the instruction-sets your host-CPU supports.

The implementations of the Vec16c functions will automatically use SSE4.2 intrinsics when they're enabled, because Vectorclass checks macros to see what's enabled. So just use Vec16c and you will automatically get the best implementations of every function that your target supports.

(This is true since you're doing compile-time CPU / target options. If you wanted to do run-time dispatching yourself, it would be harder.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

The vector class library has been updated and improved. It is moved to Github:

https://github.com/vectorclass

A Fog
  • 4,360
  • 1
  • 30
  • 32
  • This code can automatically detect the CPU instruction set at runtime and select the appropriate version of the code: https://github.com/vectorclass/version2/blob/master/dispatch_example2.cpp – A Fog Mar 24 '20 at 08:46