Vector overload of a function (provide a manually vectorized version of a function for auto-vectorization to use)

Question

I am using C, and I want to have two versions of the same function, a scalar version and a vector version. The two functions the same signature, and the compiler should pick the correct version depending on the context - if the context is autovectorized loop, it should pick an vectorized function, otherwise it should pick the scalar version.

How to achieve this, so it works in both GCC and CLANG?

One of the suggestions is to use pragma omp declare simd ..., but this doesn't work for me because I need different implementation for the two version (vectorized version is implemented using vector intrinsics).

You can't overload functions in C. Is using C++ an option for you? — chtz, Sep 15 '22 at 12:24
I think GCC does something like this for math library functions if a vector math library is available (like glibc's [libmvec](https://sourceware.org/glibc/wiki/libmvec)), but I think that support is specific to those function names. I don't know of a general way to provide a manually-vectorized implementation of a scalar function for GCC to use while auto-vectorizing. That libmvec page does mention and link stuff about a "vector ABI", but that would only be relevant for non-inline functions if it can be used at all for functions other than `exp` / `log` / `sincos` etc. — Peter Cordes, Sep 15 '22 at 17:46
@chtz: C++ wouldn't help; auto-vectorizing a loop that calls `int foo(int)` doesn't make GCC look for `__m128i foo(__m128i)`. It will inline `foo` if it can and then auto-vectorize the whole thing, unfortunately not giving you a way to supply a manually vectorized building-block for part of it. — Peter Cordes, Sep 15 '22 at 17:49

score 3 · Answer 1 · answered Sep 16 '22 at 20:12

Let' say we have a function int square(int num) { return num*num; } and we want to have an explicit manually vectorized version.

We compile with -fopenmp-simd, -mavx2 and -O3 compilation flags. This enables SIMD extensions and enabled vectorization.

In the first compilation unit we have something like this:

#pragma omp declare simd notinbranch
int square(int num);

...
#pragma omp simd
for (int i = 0; i < SIZE; i++) {
    res[i] = square(values[i]);
}

The compiler knows there is a vectorized version of square in another compilation unit.

In another compilation unit, we define square, both its scalar and vector counterparts. The scalar counterpart has a simple name square, whereas the vector counterparts use name mangling as described here.

In the other compilation unit we define:

#include <stdio.h>

int square(int num) {
    printf("1");
    return num * num;
}

#include <immintrin.h>

__m256i  _ZGVdN8v_square(__m256i num) {
    printf("2");
    return num;
}

__m128i _ZGVcN4v_square(__m128i num) {
    printf("3");
    return num;
}

The first function square is the scalar version, the second _ZGVdN8v_square is a version that processes 8 integers in one call, and the third version _ZGVcN4v_square processes 4 integers in one call.

For this example, name mangling goes like this:

_ZGV is the vector prefix
d is AVX2 isa, c is AVX isa
N is the unmasked version (corresponds to notinbranch in square declaration)
4 and 8 are vector lengths *v stands for vector parameter

I tested it with GCC and it works.

Vector overload of a function (provide a manually vectorized version of a function for auto-vectorization to use)

1 Answers1