1

I have the following piece of code that crashes on assert(!isnan(x)) when compiled with clang. If I compile using -DWITH_MMX=0, it runs fine. I observe same behaviour on Compiler Explorer and locally on my macOS.

I don't understand why the assignment of 42.0 to a long double variable produces NaN when I the program uses MMX intrinsics.

I've tried to use Compiler Explorer to figure this out, but I don't get it. Could someone help me understand what's happening here?

// cc -O0 -DWITH_MMX=1 -o nan nan.c
#include <mmintrin.h>
#include <stdio.h>
#include <assert.h>
#include <math.h>
int main()
{
#if WITH_MMX
    __m64 a = _m_from_int(4);
    __m64 b = _m_from_int(8);
    __m64 ab = _m_paddb(a, b);
    int c = _m_to_int(ab);
    assert(c == 12);
#endif
    long double x = 42.0L;
    assert(!isnan(x)); // 42.0 should not be NaN
    printf("done\n");
}
user2962393
  • 1,083
  • 9
  • 12
  • What does `printf("%d %d\n", isnan((double)x), isnan(x));` report? – chux - Reinstate Monica May 02 '23 at 13:53
  • Some compiler's fail to automatically link in math support when it should and the code is seemingly trivial. Perhaps do something more interesting with `long double`? – chux - Reinstate Monica May 02 '23 at 13:56
  • Do you need to explicitly put something that triggers an `EMMS` at the end of the MMX part? – Michael May 02 '23 at 13:58
  • `printf("%d %d\n", isnan((double)x), isnan(x));` returns `1 1`. – user2962393 May 02 '23 at 14:16
  • 3
    This is expected. The legacy floating-point instructions and the MMX instructions use the same registers, and they cannot be used for both at the same time. Using MMX instructions marks the registers as invalid for floating-point, and hence you get NaNs. To switch back, you need to execute an `emms` instruction, for which there is the “intrinsic” `_mm_empty()`. However, you also need to coordinate with the compiler, which may not expect you to be tinkering with the floating-point register state. It has been too long since I did that, so I do not recall what you need to do in that regard. – Eric Postpischil May 02 '23 at 14:52
  • 2
    Are you sure you want to bother with this? Modern processors have SSE features and others that are superior to MMX, so new programs ought to use those and not MMX unless there is a need to execute on old hardware. – Eric Postpischil May 02 '23 at 14:53
  • Would putting the MMX code and the floating point code to separate C functions, and calling both work? – pts May 02 '23 at 15:17
  • @EricPostpischil Switching to SSE is probably the right direction, but it doesn't satisfy my curiosity. `_mm_empty()` is certainly a missing piece in my code, and having learned that is enough to satisfy my curiosity. Thank you. If you post an answer, I'll accept it. – user2962393 May 02 '23 at 17:16

1 Answers1

2

The legacy floating-point instructions and the MMX instructions use the same registers, and they cannot be used for both at the same time. Using any MMX instruction other than emms marks all of the floating-point registers as in use (called “valid,” meaning that, in ordinary floating-point use, the register contains some floating-point value). This interferes with the ordinary floating-point instructions generated by the compiler, resulting in NaNs.

In assembly code, one would switch from using the registers for MMX to using the registers for legacy floating-point by executing an emms instruction, for which there is the “intrinsic” _mm_empty().

However, the compiler might not expect the floating-point registers to be empty; your MMX code does not necessarily occur at a point where the compiler has emptied the floating-point registers, so executing emms at the end of your MMX code might not reproduce the state the compiler expects. To do that, you may need to save and restore the FPU state with the fxsave and fxrstor instructions. I would not even guarantee this to work without investigating the compiler further to be sure it will group the save instruction, the MMX instructions, and the restore instruction together.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312