Trapping floating point exceptions and signal handling on Apple silicon

Question

To trap floating point exceptions on MacOS, I use an extension that provides feenableexcept functionality. The original extension (written in 2009) is here

http://www-personal.umich.edu/~williams/archive/computation/fe-handling-example.c

NOTE: If you came across this post to see how you can trap floating point exceptions on MacOS (either with Intel or Apple silicon), you might want to skip over the assembly discussion to the DETAILS below.

I'd now like to update this extension for Apple silicon and possibly remove some outdated code. Digging through fenv.h, it is clear how to update the routines feenableexcept, fegetexcept and fedisableexcept for Apple silicon. However, it is less clear what to do with the assembly code provided in the 2009 extension, or why this code is even included.

The extension provided in the link above is quite long, so I'll just extract the fragments involving the assembly :

#if DEFINED_INTEL

// x87 fpu
#define getx87cr(x)    __asm ("fnstcw %0" : "=m" (x));
#define setx87cr(x)    __asm ("fldcw %0"  : "=m" (x));
#define getx87sr(x)    __asm ("fnstsw %0" : "=m" (x));

// SIMD, gcc with Intel Core 2 Duo uses SSE2(4)
#define getmxcsr(x)    __asm ("stmxcsr %0" : "=m" (x));
#define setmxcsr(x)    __asm ("ldmxcsr %0" : "=m" (x));

#endif  // DEFINED_INTEL

This code is used in a handler for a sigaction mechanism that is provided to report on the type of floating point exception trapped.

fhdl ( int sig, siginfo_t *sip, ucontext_t *scp )
{
  int fe_code = sip->si_code;
  unsigned int excepts = fetestexcept (FE_ALL_EXCEPT);

  /* ... see complete code in link above ... */ 
     
    if ( sig == SIGFPE )
    {
#if DEFINED_INTEL
        unsigned short x87cr,x87sr;
        unsigned int mxcsr;

        getx87cr (x87cr);
        getx87sr (x87sr);
        getmxcsr (mxcsr);
        printf ("X87CR:   0x%04X\n", x87cr);
        printf ("X87SR:   0x%04X\n", x87sr);
        printf ("MXCSR:   0x%08X\n", mxcsr);
#endif

        // ....
    }
    printf ("signal:  SIGFPE with code %s\n", fe_code_name[fe_code]);
    printf ("invalid flag:    0x%04X\n", excepts & FE_INVALID);
    printf ("divByZero flag:  0x%04X\n", excepts & FE_DIVBYZERO);
  }
  else printf ("Signal is not SIGFPE, it's %i.\n", sig);

  abort();
}

An example is provided that traps exceptions and handles them through sigaction. The call to feenableexcept will either be a native implementation for systems that have feenableexcept defined (e.g. non Apple hardware) or the implementation provided in the extension linked above.

int main (int argc, char **argv)
{
    double s;
    struct sigaction act;

    act.sa_sigaction = (void(*))fhdl;
    sigemptyset (&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    

//  printf ("Old divByZero exception: 0x%08X\n", feenableexcept (FE_DIVBYZERO));
    printf ("Old invalid exception:   0x%08X\n", feenableexcept (FE_INVALID));
    printf ("New fp exception:        0x%08X\n", fegetexcept ());

    // set handler
    if (sigaction(SIGFPE, &act, (struct sigaction *)0) != 0)
    {
        perror("Yikes");
        exit(-1);
    }

//  s = 1.0 / 0.0;  // FE_DIVBYZERO
    s = 0.0 / 0.0;  // FE_INVALID
    return 0;
}

When I run this on an Intel-based Mac, I get;

Old invalid exception:   0x0000003F
New fp exception:        0x0000003E
X87CR:   0x037F
X87SR:   0x0000
MXCSR:   0x00001F80
signal:  SIGFPE with code FPE_FLTINV
invalid flag:    0x0000
divByZero flag:  0x0000
Abort trap: 6

My questions are:

Why is the assembly code and a call to fetestexcept both included in the handler? Are both necessary to report on the type of exception that was trapped?
An FE_INVALID exception was trapped by the handler. Why, then is excepts & FE_INVALID zero?
The sigaction handler is completely ignored on Apple silicon. Should it work? Or am I not understanding something more fundamental about the signal handing works using sigaction, vs. what happens when a FP exception is raised?

I am compiling with gcc and clang.

DETAILS : Here is a minimal example extracted from the original code that distills my questions above. In this example, I provide the missing feeableexcept functionality for MacOS on Intel or Apple silicon. Then I test with and without sigaction.

#include <fenv.h>    
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

#if defined(__APPLE__)
#if defined(__arm) || defined(__arm64) || defined(__aarch64__)
#define DEFINED_ARM 1
#define FE_EXCEPT_SHIFT 8
#endif

void feenableexcept(unsigned int excepts)
{
    fenv_t env;
    fegetenv(&env);

#if (DEFINED_ARM==1)
    env.__fpcr = env.__fpcr | (excepts << FE_EXCEPT_SHIFT);
#else
    /* assume Intel */
    env.__control = env.__control & ~excepts;
    env.__mxcsr = env.__mxcsr & ~(excepts << 7);
#endif
    fesetenv(&env);
}
#else
/* Linux may or may not have feenableexcept. */
#endif


static void
fhdl ( int sig, siginfo_t *sip, ucontext_t *scp )
{
    int fe_code = sip->si_code;
    unsigned int excepts = fetestexcept (FE_ALL_EXCEPT);

    if (fe_code == FPE_FLTDIV)
        printf("In signal handler : Division by zero.  Flag is : 0x%04X\n", excepts & FE_DIVBYZERO);

    abort();
}


void main()
{
#ifdef HANDLE_SIGNAL
    struct sigaction act;
    act.sa_sigaction = (void(*))fhdl;
    sigemptyset (&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &act, NULL);
#endif    
    
    feenableexcept(FE_DIVBYZERO);

    double x  = 0; 
    double y = 1/x;
}

Results without sigaction

On Intel:

% gcc -o stack_except stack_except.c
% stack_except
Floating point exception: 8

And on Apple silicon :

% gcc -o stack_except stack_except.c
% stack_except
Illegal instruction: 4

The above works as expected and code terminates when the division by zero is encountered.

Results with sigaction

Results on Intel:

% gcc -o stack_signal stack_signal.c -DHANDLE_SIGNAL
% stack_signal
In signal handler : Division by zero.  Flag is : 0x0000
Abort trap: 6

The code works as expected on Intel. However,

The return from fetestexcept (called from the signal handler) is zero. Why is this? Was the exception cleared before being processed by the handler?

Results on Apple silicon :

% gcc -o stack_signal stack_signal.c -DHANDLE_SIGNAL
% stack_signal
Illegal instruction: 4

The signal handler is ignored completely. Why is this? Am I missing something fundamental about how signals are processed?

Use of assembly in original code (see link at top of post)

My final question was concerning the use of assembly in the original example posted at the top of the post. Why was assembly used to query the flags in the signal handler? Is it not enough to use fetestexcept? Or to check siginfo.si_code? Possible answer: fetestexcept, when used inside the handler doesn't detect the exception (?). (Is this why only 0x0000 is printed from inside the handler?.)

Here is related post with a similar questions. How to trap floating-point exceptions on M1 Macs?

`#define setx87cr(x) __asm ("fldcw %0" : "=m" (x));` is super broken. It tells the compiler that `x` is a pure *output* (written by the asm template), but actually runs an asm instruction that reads from it. I expect that to break (because of dead store elimination) in anything except a debug build. Same for the `ldmxcsr` wrapper, which is even more useless because `#include ` has [`_mm_setcsr`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2,AVX_512&text=ldmxcsr) — Peter Cordes, Apr 11 '22 at 01:39
Unless AArch64 also has two separate FP exception-masks / statuses like x86 does (x87 and SSE), I don't see any reason you'd need custom functions / macros instead of ISO C fenv.h functions. `fetestexcept(FE_DIVBYZERO)` etc. should do the trick. https://en.cppreference.com/w/c/numeric/fenv/fetestexcept — Peter Cordes, Apr 11 '22 at 01:40
Yes - `fetestexcept` will test to see if an exception has occurred, but only after the fact. So it would have to be invoked for every suspect line of code. Whereas `feenableexcept` is a convenience function, (not provided with OSX, for some reason) that just makes use of fegetenv and fesetenv to set the environment to terminate execution whenever an exception occurs - very useful with gdb. — Donna, Apr 11 '22 at 01:59
I mean use `fetestexcept` in your exception handler instead of `getmxcsr`. You don't need an AArch64 port any of the mxcsr or x87 stuff. — Peter Cordes, Apr 11 '22 at 02:01
Cool - this is what I was hoping for. I do wonder why the assembly was used at all here, though. Was fetestexcept not available in 2009? Or is the fact that the call is within a signal handler that something different has to be done? — Donna, Apr 11 '22 at 04:51
`fetestexcept` would test *either* x87 or SSE exceptions, depending on which the compiler used by default for FP math. (SSE2 for x86-64, except for `long double` using x87...) So there's reason to want to check both to make sure it matches up with fetestexcept. Also, the x87 status word has precision-control bits (to make it always round to the same mantissa precision as `double` or `float`, instead of to full 80-bit), and MXCSR has DAZ / FTZ (denormals are zero / flush to zero) to disable gradual underflow because it's slow if it happens. fenv doesn't portably expose that. — Peter Cordes, Apr 11 '22 at 05:06

Peter Cordes · Answer 1 · 2022-04-11T20:28:07.360

Turns out MacOS on AArch64 will deliver SIGILL, not SIGFPE, for unmasked FP exceptions. How to trap floating-point exceptions on M1 Macs? shows an example including how to unmask specific FP exceptions, and is a duplicate for the actual goal on AArch64. (Linux on AArch64 apparently delivers SIGFPE; I don't know why MacOS would ignore the POSIX standard and deliver a different signal for arithmetic exceptions).
The rest of this answer just covers the x86 asm parts.

I suspect you also need to learn the difference between a POSIX signal like SIGSEGV or SIGFPE, a hardware exception like a page fault or x86 #DE integer divide exception, vs. an "fp exception" (event that either sets a flag in an FPU status register, or if unmasked is treated as a CPU exception, trapping to run kernel code.)

Having FP exceptions unmasked means an FP math instruction can trap (send execution into the kernel, instead of continuing to the next user-space instruction). The OS's trap handler decides to deliver a POSIX signal (or fix the problem itself on pagefault, for example, and return to user-space to rerun the instruction that faulted aka trapped.)

If FP exceptions are masked, they don't result in CPU exceptions (traps), so you can only check them from the same thread with fetestexcept. The point of feenableexcept is to unmask some exceptions.

Unless AArch64 also has two separate FP exception-masks / statuses like x86 does (x87 and SSE), I don't see any reason you'd need inline asm. fenv.h functions should work.

Unfortunately ISO C doesn't provide a way to actually unmask exceptions, just fetestexcept(FE_DIVBYZERO) etc. to check the status flags in the FP-exception state (which stay set if any operation ever raised them, since they were last cleared). https://en.cppreference.com/w/c/numeric/fenv/fetestexcept

But MacOS fenv.h does have some constants for setting the FP exception-mask bits in the FP environment with fegetenv / fesetenv. This is an alternative to GNU C feenableexcept.

Asm / intrinsics on x86 can be useful because it has two independent FP systems, legacy x87 and modern SSE/AVX.

fetestexcept would test either x87 or SSE exceptions, depending on which the compiler used by default for FP math. (SSE2 for x86-64, except for long double using x87...) So there's reason to want to check both to make sure it matches up with fetestexcept.

Also, the x87 status word has precision-control bits (to make it always round to the same mantissa precision as double or float, instead of to full 80-bit), and MXCSR has DAZ / FTZ (denormals are zero / flush to zero) to disable gradual underflow because it's slow if it happens. fenv doesn't portably expose that.

The x86 inline asm is very naive and broken

If you do actually want wrappers for these x87 operations, look elsewhere for ones written carefully.

#define setx87cr(x) __asm ("fldcw %0" : "=m" (x)); is super broken. It tells the compiler that x is a pure output (written by the asm template), but actually runs an asm instruction that reads from it. I expect that to break (because of dead store elimination) in anything except a debug build. Same for the ldmxcsr wrapper, which is even more useless because #include <immintrin.h> has _mm_setcsr

They all need to be asm volatile, otherwise they're considered a pure function of the inputs, so with no inputs and one output, the compiler can assume that it always writes the same output and optimize accordingly. So if you wanted to read status multiple times to check for new exceptions after each of a series of calculations, the compiler would likely just reuse the first result.

(With only an input instead of an output operand, a correct wrapper for fldcw would be volatile implicitly.)

Another complication is that a compiler could choose to do an FP op earlier or later than you expected. One way you can fix that is by using the FP value as an input, like asm volatile("fnstsw %0" : "=am"(sw) : "g"(fpval) ). (I also used "a" as one of the possible outputs, since there's a form of that instruction which writes to AX instead of memory. Of course you need it to be a uint16_t or short.)

Or use a "+g"(fpval) read+write "output" operand to tell the compiler it reads/writes fpval, so this has to happen before some calculation that uses it.

I'm not going to attempt fully correct versions myself in this answer, but that's what to look for.

I had originally guessed that s = 0.0 / 0.0; might not be compiling to a divide instruction with clang for AArch64. You might just get a compile-time-constant NaN, and optimize away an unused result, if you don't use something like

    volatile double s = 0.0;
    s = 0.0 / s;             // s is now unknown to the compiler

You can check the compiler's asm output to make sure there is an actual FP divide instruction.

BTW, ARM and AArch64 don't trap on integer division by 0 (unlike x86), but with the FP exception unmasked hopefully FP ops do. But if this still doesn't work, then it's time to read the asm manuals and look at compiler asm output.

I am using gcc and clang. And I just took the example from original code (0/0). But I have tried the same thing with `double x=0; double y = 1/x' and get the same exception. — Donna, Apr 11 '22 at 04:47
I like your first answer - just use fetestexcept. It seems that the assembly is only used to provide the user with info on which exception was trapped. — Donna, Apr 11 '22 at 04:49
However, I am still wondering about the code involving `sigaction` and `SIGFPE`. This does not work on ARM, as far as I can tell. The exception is trapped, but the handler is never called. — Donna, Apr 11 '22 at 04:50
@Donna: My guess was that when compiling for AArch64, no FP exception was being triggered in the first place and that was the reason for no signal being delivered. Your [mcve] didn't use `fetestexcept` in the main thread, only in the signal handler, so it wasn't clear until your comment just now that you had confirmed that you could detect divide by zero in the same thread, just not get a signal delivered. But it sounds like you're saying you did test that and confirm? — Peter Cordes, Apr 11 '22 at 05:14
@Donna: https://en.cppreference.com/w/cpp/numeric/fenv points out `feenableexcept` is a GNU extension. (The glibc manual confirms it's GNU, not even POSIX or something). Is it not available on MacOS? It seems ISO C fenv.h doesn't have facilities to get FP math to deliver signals. — Peter Cordes, Apr 11 '22 at 05:17
(It's unfortunate that the same word "exception" gets used for very different things, setting a sticky flag bit in an FP status register vs. trapping to the OS so it can deliver a signal.) — Peter Cordes, Apr 11 '22 at 05:21
@Donna: You say you "*can trap exceptions on my M1 with feenableexcept.*" so I guess you do have that function on MacOS. What happens when they trap? You process dies with a SIGFPE? — Peter Cordes, Apr 11 '22 at 05:23
@Donna: Also, you said you used `double x=0; double y = 1/x;`. That omits `volatile`, which was the point of the exercise (unless you compile with optimization disabled, in which case all variables are treated sort of like volatile across statements. Then just splitting it up across two separate statements should work.) Anyway, did you enable all exceptions? Maybe there isn't a separate one for divide by zero vs. invalid on AArch64? — Peter Cordes, Apr 11 '22 at 05:24
No, feenableexcept is not available on MacOS. So I have been using the DYI extension linked to at the top of my post. But that was written for PowerPC, and Intel Macs. I now want to extend it to ARM. I was able to extend this to ARM and it works (one caveat : underflows are not detected by fetestexcept, for some odd reason). I'll post a simple example. — Donna, Apr 11 '22 at 06:26
@Donna: Oh, so your didn't actually test the code with `printf ("Old invalid exception: 0x%08X\n", feenableexcept (FE_INVALID));` that was in your question. It's normal that you don't get signals if you didn't *unmask* FP exceptions to make them actually trap. It sounded from your question like you were doing something you expected to work, and that you said involved `feenableexcept`. — Peter Cordes, Apr 11 '22 at 06:29
I think my post confused several different question (I struggled with how to ask the questions). The code I ran is from the link at the top of my post (fe-example-except.c). In that code, a DYI "feenableexcept" (defined in the code) is called. This works on Intel Macs. I'd like to update this code for ARM. I have figured out how to do that (see updated post). My question was more about (1) the use of assembly in the example (why was it used here? Am I missing something?) and (2) Why is sigaction being used? It doesn't seem necessary, if exceptions are trapped without it. — Donna, Apr 11 '22 at 06:43
My guess is that sigaction provides a "handler" not available through the usual exception handling with "feenableexept". The handler was useful because it tells exactly what exception was triggered. However this doesn't work on the M1. I can live with that, but what then maybe it should work on the M1 (e.g. Apple silicon). See updated post above. — Donna, Apr 11 '22 at 06:45
@Donna: The default action for SIGFPE is to terminate the process, like SIGSEGV. If the CPU traps because of arithmetic (e.g. because you unmasked an FP exception and then did FP math that raised it), the OS will deliver a SIGFPE. BTW, you still haven't updated your question post, it still makes incorrect claims about what you tested (like feenableexcept actually causing traps), and/or claims that refer to code that isn't in the question. — Peter Cordes, Apr 11 '22 at 06:46
@Donna: The PowerPC version seems to implement `feenableexcept` with just some flags in the FP environment, but the trick would be finding the same flags for AArch64 in MacOS headers or manually in a CPU manual. I don't know where in an AArch64 manual to look for it myself, else I'd answer. — Peter Cordes, Apr 11 '22 at 06:53
I suspect you also need to learn the difference between a POSIX signal like SIGSEGV or SIGFPE, a hardware exception like a page fault or x86 `#DE` integer divide exception, vs. an "fp exception" (event that either sets a flag in an FPU status register, or *if unmasked* is treated as a CPU exception, trapping to run kernel code.) What you're saying / asking seems to conflate these things. Having FP exceptions *unmasked* is what gives you signals. If they're masked, you can only check them from the same thread with `fetestexcept`. The point of `feenableexcept` is to unmask some exceptions. — Peter Cordes, Apr 11 '22 at 06:55
See updated post. I provided two examples - one for `feenableexcept`, and one for `sigaction`. I think I found the flags for the AArch64 environment. I am mostly just interested in the four or five exceptions listed in fenv.h (divbyzero, invalid, overflow, underlow,inexact), although the last two are mostly uninteresting. — Donna, Apr 11 '22 at 07:35

score 1 · Answer 2 · answered Feb 01 '23 at 05:57

1

GCC has fpu-aarch64.h header in gfortran/config which implements everything needed to handle FP exceptions on Apple M.

answered Feb 01 '23 at 05:57

barracuda156

21
1

Trapping floating point exceptions and signal handling on Apple silicon

2 Answers2

The x86 inline asm is very naive and broken