Inform c or c++ compiler loop length is mutliple of 8

Question

I want to write the following function in c++ (compiling using gcc 11.1 with -O3 -mavx -std=c++17)

void f( float * __restrict__ a, float * __restrict__ b, float * __restrict__ c, int64_t n) {
    for (int64_t i = 0; i != n; ++i) {
        a[i] = b[i] + c[i];
    }
}

This generates about 60 lines of assembly, many of which deal with the case where n is not a multiple of 8. https://godbolt.org/z/61MYPG7an

I know that n is always a multiple of 8. One way I could change this code is to replace for (int64_t i = 0; i != n; ++i) with for (int64_t i = 0; i != (n / 8 * 8); ++i). This generates only about 20 assembly instructions. https://godbolt.org/z/vhvdKMfE9

However, on line 5 of the second godbolt link, there is an instruction to zero the lowest three bits of n. If there was a way to inform the compiler that n will always be a multiple of 8, then this instruction could be omitted with no change in behavior. Does anyone know of a way to do this on any c or c++ compiler (especially on gcc or clang)? In my case this doesn't actually matter, but I'm interested and not sure where to look.

You already have `#include `, why won't you use those intrinsics directly? — Vlad Feinstein, Aug 20 '21 at 03:36
Hi Vlad, that was left over from other experiments. I don't need it here. Sorry! — Henry Heffan, Aug 20 '21 at 03:40
The answer below is cleaner, but I was saying that you can loop by 8 elements at a time, and process them at one using intrinsic AVX functions. — Vlad Feinstein, Aug 20 '21 at 03:45

score 11 · Accepted Answer · edited Aug 20 '21 at 03:42

11

Declare the assumption with __builtin_unreachable

void f(float *__restrict__ a, float *__restrict__ b, float *__restrict__ c, int64_t n) {
    if(n % 8 != 0) __builtin_unreachable(); // control flow cannot reach this branch so the condition is not necessary and is optimized out
    for (int64_t i = 0; i != n; ++i) { // if control flow reaches this point n is a multiple of 8
        a[i] = b[i] + c[i];
    }
}

This produces much shorter code.

edited Aug 20 '21 at 03:42

Justin

24,288
12
92
142

answered Aug 20 '21 at 03:34

HTNW

27,182
1
32
60

1

I suspect you could achieve something similar in MSVC using `__assume(...)`. Either `if (n % 8 != 0 ) __assume(0);`, or `__assume(n % 8 == 0)` – Human-Compiler Aug 20 '21 at 04:24

Inform c or c++ compiler loop length is mutliple of 8

1 Answers1