I want to write the following function in c++ (compiling using gcc 11.1 with -O3 -mavx -std=c++17
)
void f( float * __restrict__ a, float * __restrict__ b, float * __restrict__ c, int64_t n) {
for (int64_t i = 0; i != n; ++i) {
a[i] = b[i] + c[i];
}
}
This generates about 60 lines of assembly, many of which deal with the case where n is not a multiple of 8. https://godbolt.org/z/61MYPG7an
I know that n
is always a multiple of 8. One way I could change this code is to replace for (int64_t i = 0; i != n; ++i)
with for (int64_t i = 0; i != (n / 8 * 8); ++i)
. This generates only about 20 assembly instructions. https://godbolt.org/z/vhvdKMfE9
However, on line 5 of the second godbolt link, there is an instruction to zero the lowest three bits of n
. If there was a way to inform the compiler that n
will always be a multiple of 8, then this instruction could be omitted with no change in behavior. Does anyone know of a way to do this on any c or c++ compiler (especially on gcc or clang)? In my case this doesn't actually matter, but I'm interested and not sure where to look.