6

I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss e.g.

__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);

Now I expected std::max type behavior when NaNs are involved but clang and gcc give different results:

Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000 
d: nan 0.000000 0.000000 0.000000 

Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000 
d: nan 0.000000 0.000000 0.000000 

_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math, -fno-fast-math but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?

Godbolt link here

phuclv
  • 37,963
  • 15
  • 156
  • 475
Biggy Smith
  • 910
  • 7
  • 14
  • 3
    This well-written post https://stackoverflow.com/a/40199125/12939557 talks about your issue and it explains why this is happening. However, it seems that this issue should have been fixed in GCC 7 while it is obviously still there in the latest version of GCC... I think it is wise to open a ticket for this on the GCC bug tracker. – Jérôme Richard Mar 09 '21 at 21:16
  • 4
    i have filed a bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99497 for the main issue – Biggy Smith Mar 09 '21 at 22:46

1 Answers1

3

My understanding is that IEEE-754 requires: (NaN cmp x) to return false for all cmp operators {==, <, <=, >, >=}, except {!=} which returns true. An implementation of a max() function might be defined in terms of any of the inequality operators.

So, the question is, how is _mm_max_ps implemented? With {<, <=, >, >=}, or a bit comparison?

Interestingly, when disabling optimization in your link, the corresponding maxss instruction is used by both gcc and clang. Both yield:

2.000000 0.000000 0.000000 0.000000 
nan 0.000000 0.000000 0.000000

This suggests, given: max(NaN, 2.0f) -> 2.0f, that: max(a, b) = (a op b) ? a : b, where op is one of: {<, <=, >, >=}. With IEEE-754 rules, the result of this comparison is always false, so:

(NaN op val) is always false, returning (val),
(val op NaN) is always false, returning (NaN)

With optimization on, the compiler is free to precompute (c) and (d) at compile time. It appears that clang evaluates the results as the maxss instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max() - it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss semantics.

GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.


From Intel's ISA reference:

If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN from either source operand be returned, the action of MAXSS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

Brett Hale
  • 21,653
  • 2
  • 61
  • 90
  • 2
    If I'd seen the answer linked in the comment above, I probably wouldn't have bothered! In any case it must still be regarded as a bug in gcc. It may also be significant that there are two different NaN encodings in the gcc asm: `0x7fc00000` and `0x7ff80000` - Something strange going on there... – Brett Hale Mar 09 '21 at 21:56
  • I appreciate that you did, as it gives some extra context to the situation. – Biggy Smith Mar 09 '21 at 22:47