6

I want to write the following function in c++ (compiling using gcc 11.1 with -O3 -mavx -std=c++17)

void f( float * __restrict__ a, float * __restrict__ b, float * __restrict__ c, int64_t n) {
    for (int64_t i = 0; i != n; ++i) {
        a[i] = b[i] + c[i];
    }
}

This generates about 60 lines of assembly, many of which deal with the case where n is not a multiple of 8. https://godbolt.org/z/61MYPG7an

I know that n is always a multiple of 8. One way I could change this code is to replace for (int64_t i = 0; i != n; ++i) with for (int64_t i = 0; i != (n / 8 * 8); ++i). This generates only about 20 assembly instructions. https://godbolt.org/z/vhvdKMfE9

However, on line 5 of the second godbolt link, there is an instruction to zero the lowest three bits of n. If there was a way to inform the compiler that n will always be a multiple of 8, then this instruction could be omitted with no change in behavior. Does anyone know of a way to do this on any c or c++ compiler (especially on gcc or clang)? In my case this doesn't actually matter, but I'm interested and not sure where to look.

1 Answers1

11

Declare the assumption with __builtin_unreachable

void f(float *__restrict__ a, float *__restrict__ b, float *__restrict__ c, int64_t n) {
    if(n % 8 != 0) __builtin_unreachable(); // control flow cannot reach this branch so the condition is not necessary and is optimized out
    for (int64_t i = 0; i != n; ++i) { // if control flow reaches this point n is a multiple of 8
        a[i] = b[i] + c[i];
    }
}

This produces much shorter code.

Justin
  • 24,288
  • 12
  • 92
  • 142
HTNW
  • 27,182
  • 1
  • 32
  • 60
  • 1
    I suspect you could achieve something similar in MSVC using `__assume(...)`. Either `if (n % 8 != 0 ) __assume(0);`, or `__assume(n % 8 == 0)` – Human-Compiler Aug 20 '21 at 04:24