Is the flag -ffixed- always bugged in GCC?

Question

I have 3 versions of gcc installed on my linux 64 bit machine

gcc 4.9.2
gcc 5.3.0
gcc 6 [ a build from an svn snapshot ]

all 3 compilers give me the same error when I try to explcitly reserve xmm registers with

-ffixed-xmm0 -ffixed-xmm1 -ffixed-xmm2 -ffixed-xmm3 -ffixed-xmm4 -ffixed-xmm5 -ffixed-xmm6 -ffixed-xmm7 -ffixed-xmm8 -ffixed-xmm9 -ffixed-xmm10 -ffixed-xmm11 -ffixed-xmm12 -ffixed-xmm13 -ffixed-xmm14 -ffixed-xmm15

and the error is a compiler error

internal compiler error: in copy_to_mode_reg, at explow.c:595
   return (__m128i)__builtin_ia32_paddsw128 ((__v8hi)__A, (__v8hi)__B);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.

should I file a bug ? I have noticed that clang doesn't support a similar flag to control code generation, so maybe the gcc created this flag long time ago and now it's just not worth it ?

When I look at the assembly code generated from my C function using clang there is no byte spill and it looks like all the xmm registers are being used as instruncted, but gcc on the other hand doesn't really generate a clean assembly and I still would like to impose this behaviour .

There is another way to force a given usage of SSE and AVX registers ? It's possible to get a warning when there is a misuse of the registers ?

Thanks.

dummy function for testing purposes

#include <stdio.h>
#include <stdint.h>
#include <malloc.h>
#include <emmintrin.h>

typedef int32_t T;

void foo( T * ptr ) { 
  __m128i v0  = _mm_load_si128( (__m128i *) ( &ptr[0] ) );
  __m128i v1  = _mm_load_si128( (__m128i *) ( &ptr[4] ) );
  __m128i v2  = _mm_load_si128( (__m128i *) ( &ptr[8] ) );
  __m128i v3  = _mm_load_si128( (__m128i *) ( &ptr[12] ) );
  __m128i v4  = _mm_load_si128( (__m128i *) ( &ptr[16] ) );
  __m128i v5  = _mm_load_si128( (__m128i *) ( &ptr[20] ) );
  __m128i v6  = _mm_load_si128( (__m128i *) ( &ptr[24] ) );
  __m128i v7  = _mm_load_si128( (__m128i *) ( &ptr[28] ) );
  __m128i v8  = _mm_load_si128( (__m128i *) ( &ptr[32] ) );
  __m128i v9  = _mm_load_si128( (__m128i *) ( &ptr[36] ) );
  __m128i v10 = _mm_load_si128( (__m128i *) ( &ptr[40] ) );
  __m128i v11 = _mm_load_si128( (__m128i *) ( &ptr[44] ) );
  __m128i v12 = _mm_load_si128( (__m128i *) ( &ptr[48] ) );
  __m128i v13 = _mm_load_si128( (__m128i *) ( &ptr[52] ) );
  __m128i v14 = _mm_load_si128( (__m128i *) ( &ptr[56] ) );
  __m128i v15 = _mm_load_si128( (__m128i *) ( &ptr[60] ) );
  v0          = _mm_adds_epi16( v0, v1 );
  v0          = _mm_adds_epi16( v0, v2 );
  v0          = _mm_adds_epi16( v0, v3 );
  v0          = _mm_adds_epi16( v0, v4 );
  v0          = _mm_adds_epi16( v0, v5 );
  v0          = _mm_adds_epi16( v0, v6 );
  v0          = _mm_adds_epi16( v0, v7 );
  v0          = _mm_adds_epi16( v0, v8 );
  v0          = _mm_adds_epi16( v0, v9 );
  v0          = _mm_adds_epi16( v0, v10 );
  v0          = _mm_adds_epi16( v0, v11 );
  v0          = _mm_adds_epi16( v0, v12 );
  v0          = _mm_adds_epi16( v0, v13 );
  v0          = _mm_adds_epi16( v0, v14 );
  v0          = _mm_adds_epi16( v0, v15 );
  _mm_store_si128( (__m128i *) ptr, v0 ); 
}

@PeterCordes I was interpreting that as a policy for the code generation pipeline of `gcc` itself, not as a policy that is changing what I can access with my own code . I think it's a little misleading to put that option in the _code generation_ section . At this point I also can't see the use case scenario for this kind of options . — xelp, Mar 05 '16 at 04:23
You can access whatever registers you like with inline asm, and that is precisely when you *might* want gcc to keep its hands off a register or two. But nowhere in those intrinsics calls do I see the name of a hardware register, so it must be filled in during *code generation*. — rici, Mar 05 '16 at 04:28

score 3 · Accepted Answer · answered Mar 05 '16 at 04:22

3

You could write that set of command line options much more readably as -ffixed-xmm{0..15} (bash syntax).

I'm not surprised it breaks the compiler when you tell it that all the xmm regs are reserved, and then you try to use intrinsics. The gcc man page says that -ffixed-reg means:

Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer ...

Also, gcc 4.9.2, 5.x, and gcc6 snapshot all make perfectly find code. They fold all the aligned loads into memory operands for paddsw, so the function is one movdqa and fifteen paddsw (all to xmm0).

Did you compile without optimization? Of course that asm will be terrible, because -O0 requires every local to be in memory after ever C statement.

answered Mar 05 '16 at 04:22

Peter Cordes

328,167
45
605
847

on my machine asm code generated by clang is just more readable, even when using `alloca`, it looks like `clang` is able to generate a cleaner looking sequence of operations and group the same kind of ops together . But in terms of performances, I can't say anything yet since I'm still writing intrinsics and I have no library to test . – xelp Mar 05 '16 at 04:27
also, what about byte spill and general misuse of intrinsics ? There are warnings or interesting flags for that ? – xelp Mar 05 '16 at 04:31
@xelp: I forget if there are any options to get a warning when gcc knows it's making bad code. In this case, gcc makes ideal code. It doesn't spill anything or waste any instructions. Read the 2nd half of my answer. – Peter Cordes Mar 05 '16 at 04:35
Already did, but my idea was more geared towards warnings and verbose errors . I generally appreciate `gcc` being faster than `clang` but my problem is that there is nothing to "debug" or enforce a given behaviour for just the intrinsics case. Anyway, thanks for your time . – xelp Mar 05 '16 at 04:41
@xelp: compiling faster? Or making faster / more efficient code? clang sometimes does better than gcc, but also trips up sometimes. – Peter Cordes Mar 05 '16 at 04:45
What I meant was that `gcc` usually generates faster code on my linux 64 box . Usually `clang` is better at diagnostic, time of compilation and "modularity", with llvm bytecode you can easily split the optimization process as you wish . – xelp Mar 05 '16 at 04:58

score 1 · Answer 2 · answered Mar 05 '16 at 19:02

almost every time gcc displays a message that starts with internal compiler error, you should file a bug. the error message usually includes a link to the website where you can file them (e.g. with your distro or with upstream gcc).

off the top of my head, there are two exceptions to this rule:

if it says something like internal compiler error: Killed (program xxx) -- the majority of the time this is due to your system running out of RAM. add more RAM, or increase swap, or do something else on your system to improve this.
if you retry the compile command and it works -- most of the time, this is a bug in your computer rather than gcc (e.g. the OS is buggy, or the hardware is flaky).

your example here does not seem to be either of those cases, so if it's still happening with gcc-5.3 & current gcc-6 snapshots, it would be great if you could file a bug. since you're using gcc-6 snaps, i assume you built it yourself, so you can go straight to gcc's bugzilla.

Is the flag -ffixed- always bugged in GCC?

2 Answers2