I have recently worked with C intrinsics in order to make my code faster, especially SIMD implementations. I pose the following problem: Given a __m128 acc
which holds 4 floats, I want to accumulate them into a single float ac
.
acc = _mm_hadd_ps(acc,acc);
acc = _mm_hadd_ps(acc,acc);
ac = _mm_cvtss_f32(acc);
does however not compile, even though the functionality of calling _mm_hadd_ps()
twice does match my goal. The compiler does produce the following output:
Compiler command failed with code 1
Command: gcc -pipe -std=c17 -g -gdwarf-4 -O3 -Wall -Wextra -Wpedantic -pedantic-errors -ffreestanding -nostdlib -static -march=k8 -mtune=generic -mno-80387 -mno-mmx -D_MM_MALLOC_H_INCLUDED -Wa,-march=k8+cmov+nommx -Wa,-mx86-used-note=no -Wa,--fatal-warnings -Wl,-n -Wl,--fatal-warnings -Wl,--no-dynamic-linker -Wl,--build-id=none -Wl,-z,defs -Wl,-z,noexecstack -Wl,-z,norelro -Wl,-z,noseparate-code -Wl,-Ttext=0x507340f44000 -Wl,-e,sdot -include defs.inc -o user.elf user.c
In file included from /usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/include/immintrin.h:33,
from user.c:2:
user.c: In function 'sdot':
/usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/include/pmmintrin.h:56:1: error: inlining failed in call to 'always_inline' '_mm_hadd_ps': target specific option mismatch
| _mm_hadd_ps (__m128 __X, __m128 __Y)
| ^~~~~~~~~~~
user.c:19:16: note: called from here
19 | __m128 acc3 = _mm_hadd_ps(acc2,acc2);
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/include/immintrin.h:33,
from user.c:2:
/usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/include/pmmintrin.h:56:1: error: inlining failed in call to 'always_inline' '_mm_hadd_ps': target specific option mismatch
56 | _mm_hadd_ps (__m128 __X, __m128 __Y)
| ^~~~~~~~~~~
user.c:17:18: note: called from here
17 | __m128 acc2 = _mm_hadd_ps(acc1,acc1);
| ^~~~~~~~~~~~~~~~~~~~~~
What does it mean for a function to have to be inlined, and where in the code do I violate said restriction?