Let' say we have a function int square(int num) { return num*num; }
and we want to have an explicit manually vectorized version.
We compile with -fopenmp-simd
, -mavx2
and -O3
compilation flags. This enables SIMD extensions and enabled vectorization.
In the first compilation unit we have something like this:
#pragma omp declare simd notinbranch
int square(int num);
...
#pragma omp simd
for (int i = 0; i < SIZE; i++) {
res[i] = square(values[i]);
}
The compiler knows there is a vectorized version of square in another compilation unit.
In another compilation unit, we define square
, both its scalar and vector counterparts. The scalar counterpart has a simple name square
, whereas the vector counterparts use name mangling as described here.
In the other compilation unit we define:
#include <stdio.h>
int square(int num) {
printf("1");
return num * num;
}
#include <immintrin.h>
__m256i _ZGVdN8v_square(__m256i num) {
printf("2");
return num;
}
__m128i _ZGVcN4v_square(__m128i num) {
printf("3");
return num;
}
The first function square
is the scalar version, the second _ZGVdN8v_square
is a version that processes 8 integers in one call, and the third version _ZGVcN4v_square
processes 4 integers in one call.
For this example, name mangling goes like this:
_ZGV
is the vector prefix
d
is AVX2 isa, c
is AVX isa
N
is the unmasked version (corresponds to notinbranch
in square
declaration)
4
and 8
are vector lengths
*v
stands for vector parameter
I tested it with GCC and it works.