5

So, I am trying to return from a floating point exception, but my code keeps looping instead. I can actually exit the process, but what I want to do is return and redo the calculation that causes the floating point error.

The reason the FPE occurs is because I have a random number generator that generates coefficients for a polynomial. Using some LAPACK functions, I solve for the roots and do some other things. Somewhere in this math intensive chain, a floating point exception occurs. When this happens, what I want to do is increment the random number generator state, and try again until the coefficients are such that the error doesn't materialize, as it usually doesn't, but very rarely does and causes catastrophic results.

So I wrote a simple test program to learn how to work with signals. It is below:

In exceptions.h

#ifndef EXCEPTIONS_H
#define EXCEPTIONS_H

#define _GNU_SOURCE

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <fenv.h>

void overflow_handler(int);

#endif // EXCEPTIONS_H //

In exceptions.c

#include "exceptions.h"

void overflow_handler(int signal_number)
{
    if (feclearexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID)){
        fprintf(stdout, "Nothing Cleared!\n");
    }
    else{
        fprintf(stdout, "All Cleared!\n");
    }

    return;
}

In main.c

#include "exceptions.h"


int main(void)
{
    int failure;
    float oops;

    //===Enable Exceptions===//
    failure = 1;
    failure = feenableexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID);
    if (failure){
        fprintf(stdout, "FE ENABLE EXCEPTIONS FAILED!\n");
    }

    //===Create Error Handler===//
    signal(SIGFPE, overflow_handler);

    //===Raise Exception===//
    oops = exp(-708.5);
    fprintf(stdout, "Oops: %f\n", oops);

    return 0;
}

The Makefile

#===General Variables===#
CC=gcc
CFLAGS=-Wall -Wextra -g3 -Ofast

#===The Rules===#
all: makeAll

makeAll: makeExceptions makeMain
    $(CC) $(CFLAGS) exceptions.o main.o -o exceptions -ldl -lm

makeMain: main.c 
    $(CC) $(CFLAGS) -c main.c -o main.o

makeExceptions: exceptions.c exceptions.h
    $(CC) $(CFLAGS) -c exceptions.c -o exceptions.o 

.PHONY: clean

clean:
    rm -f *~ *.o

Why doesn't this program terminate when I am clearing the exceptions, supposedly successfully? What do I have to do in order to return to the main, and exit?

If I can do this, I can put code in between returning and exiting, and do something after the FPE has been caught. I think that I will set some sort of flag, and then clear all most recent info in the data structures, redo the calculation etc based on whether or not that flag is set. The point is, the real program must not abort nor loop forever, but instead, must handle the exception and keep going.

Help?

The Dude
  • 661
  • 2
  • 11
  • 20
  • What do you see when you execute the program above? – ouah Jun 16 '15 at 18:11
  • I see, "All Cleared!" in stdout...infinitely...it never stops. – The Dude Jun 16 '15 at 18:12
  • What system (compiler version, OS and architecture)? – ouah Jun 16 '15 at 18:13
  • gcc 4.6, Ubuntu 12.04.5 LTS, 3.13.0-54-generic #91~precise1--i686 i686 i386 GNU/Linux – The Dude Jun 16 '15 at 18:16
  • Are you sure you are clearing *all* of the exceptions occurred? You should check it with `fegetexceptflag`. Or alternatively clear `FE_ALL_EXCEPT`. – Eugene Sh. Jun 16 '15 at 18:24
  • It's easy enough to paste the various code regions in your question into a single file, but it's even better if you just give us a single file to paste. – tmyklebu Jun 16 '15 at 18:24
  • @Eugene. Just tested with clearing FE_ALL_EXCEPT...same problem. – The Dude Jun 16 '15 at 18:28
  • @tmyklebu Perhaps, but this is how I have it setup. I have no idea what is going on, so I don't want to change much. – The Dude Jun 16 '15 at 18:29
  • Look [here](http://technopark02.blogspot.ca/2005/10/handling-sigfpe.html). It is explaining that the return address is pointing to the same instruction caused the exception. Anyway, it is UB which is working this way in this specific case.. – Eugene Sh. Jun 16 '15 at 18:33
  • @Eugene -- I think this is what is going on! What the link describes is exactly what is happening. So, how can I update the instruction pointer in my example? – The Dude Jun 16 '15 at 18:35
  • You shouldn't. I' a bad practice to rely on undefined behaviour to behave definitely. The handler should not return. – Eugene Sh. Jun 16 '15 at 18:37
  • @EugeneSh.: You are not forced to write the program in a way that elicits undefined behaviour. Traps in IEEE 754 are specified the way they are explicitly so that you can play these sorts of games. – tmyklebu Jun 16 '15 at 18:37
  • @tmyklebu I agree with the first part, but don't understand the second. – Eugene Sh. Jun 16 '15 at 18:40
  • @Eugene I think he is trying to say that there are multiple options available, even if they are 'bad practice'. I am not one for programming dogma, and want to know all the options, even if they aren't portable and can cause their own issues down the road. At the very least, using these techniques, I can collect the seeds that cause this error and simply discard them when they come up in production. – The Dude Jun 16 '15 at 18:43
  • @TheDude: Any fiddling with the satck might result in desaster if compiled with different options, the code changed, new compiler version, etc. It is definitively better to catch such error in advance, even morre as the signals are not guaranted to be generated. I hopefully never run into such software. – too honest for this site Jun 16 '15 at 18:48
  • @EugeneSh.: See the long comment on my answer. I guess the first salient example that comes to my mind is "extended-exponent double"; it's a pair of a double and an int where the int acts as "extra bits" in the exponent. When an operation results in an overflow, underflow, or unanticipated loss of precision, you need to play games with the exponents of the numbers. If almost all of the operations your program does do not elicit any of these conditions, it may be profitable to use trap handling instead of explicit conditions to implement "extended-exponent double." – tmyklebu Jun 16 '15 at 18:53
  • @Olaf I agree that it is better to catch the error in advance, but right now, I have no method for even determining the state of the program that will generate the error...unless I use all of this voodoo here. – The Dude Jun 16 '15 at 18:54
  • @tmyklebu Thank you for clarification, I am taking look at it. The link you provided is actually a well defined extension of GNU C, which is totally legit to use. Unfortunately I can't find anything regarding the state of the system after such exception occurred. – Eugene Sh. Jun 16 '15 at 18:57
  • @EugeneSh.: Demmel and Li's paper "Faster floating-point algorithms via expcetion handling" shows how to (ab)use trap handling to speed up condition number estimation and eigenvector computation. Another example where traps used to get used is where people use `double`s just as really big integers; an inexact exception means that you generated an integer too big to fit in a `double`. – tmyklebu Jun 16 '15 at 19:00
  • @Olaf: [Link.](http://www.acsel-lab.com/arithmetic/arith11/papers/ARITH11_Demmel.pdf) – tmyklebu Jun 16 '15 at 19:14
  • Ok, from a short peek, It does not cover aspects of an actual programming language standard much less C's handling. It is solely based on OS-based traps. So: not related to C and its specifics. – too honest for this site Jun 16 '15 at 19:31
  • @Olaf: Is your intent to turn this question from a practical programming problem into a language-lawyering party? – tmyklebu Jun 16 '15 at 19:44
  • Not at all. But I defy hackish solutions for unclear reason. It is not even clear, what the OP wants to achieve with such hacks. But if you refer to sticking to the standard: then yes. That's what the tag actualy says. Otherwise he should state clear his OS, build environment, etc. and that he is willing to accept [daemons flying out of his nose](http://www.catb.org/jargon/html/N/nasal-demons.html) after the next libc update. Just to get thing right: I can very well accept a bare-metal system to use such tricks. But on a PC, there are far too much components involved for a reliable solution. – too honest for this site Jun 16 '15 at 19:58
  • @Olaf: Trap handling isn't really "hackish." And, like I've said to a few other people, Annex H of the C standard explicitly allows trap-and-resume for floating-point exceptions. This flies in the face of Section 7's "thou shalt not return from a SIGFPE handler," which leads me to think Section 7 has careless wording that will soon be fixed. – tmyklebu Jun 16 '15 at 20:08
  • @tmyklebu: "..that will soon" your part of the standard commitee then? Wow! However. Note that the trap need not be precise, so full recovery may not be possible. Even if you recover, you would have to check the current system state. If using setjmp, this would likely introduce barriers into your code, with performance going down to your knees. I still cannot see much advantage in that. Also you should check out for "math.h"; there are additional signalling pitfalls. ... – too honest for this site Jun 16 '15 at 20:20
  • @tmyklebu: ... However, I retreat about that not being impossible at all, but keep up my flag for recovery being nonsense at best compared to using safe algorithms. – too honest for this site Jun 16 '15 at 20:20

2 Answers2

4

"division by zero", overflow/underflow, etc. result in undefined behaviour in the first place. If the system, however, generates a signal for this, the effect of UB is "suspended". The signal handler takes over instead. But if the handler returns, the effect of UB will "resume".

Therefore, the standard disallows returning from such a situation.

Just think: How would the program have to recover from e.g. DIV0? The abstract machine has no idea about FPU registers or status flags, and even if - what result would have to be generated?

C also has no provisions to unroll the stack properly like C++.

Note also, that generating signals for arithmetic exceptions is optional, so there is no guarantee a signal will actually be generated. The handler is mostly meant to notify about the event and possibly clean up external resources.

Behaviour is different for signals which do not origin from undefined behaviour, but just interrupt program execution. This is well defined as the program state is well-defined.

Edit:

If you have to rely on the program to continue under all circumstances, you hae to check all arguments of arithmetic operations before doing the actual operation and/or use safe operations only (re-order, use larger intermediate types, etc.). One exaple for integers might be to use unsigned instead of signed integers, as for those overflow-behavior is well-defined (wrap), so intermediate results overflowing will not make trouble as long as that is corrected afterwards and the wrap is not too much. (Disclaimer: that does not always work, of course).

Update:

While I am still not completely sure, according to comments, the standard might allow, for a hosted environment at least, to use LIA-1 traps and to recover from them (see Annex H. As these are not necessarily precise, I suspect recovery is not possible under all circumstances. Also, math.h might present additional aspects which have to be carefully evaluated.

Finally: I still think there is nothing gained with such approach, but some uncertainty added compared to using safe algorithms. It would be different, if there wer not so much different components involved. For a bare-metal embedded system, the view might be completely different.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • What undefined behaviour occurs in OP's program? – tmyklebu Jun 16 '15 at 18:26
  • @tmyklebu: According to the C Standard, 7.14.1.1 [ISO/IEC 9899:2011], if a signal handler returns when it has been entered as a result of a computational exception (that is, with the value of its argument of SIGFPE, SIGILL, SIGSEGV, or any other implementation-defined value corresponding to such an exception) returns, then the behavior is undefined (see undefined behavior 130). (See https://www.securecoding.cert.org/confluence/display/c/SIG35-C.+Do+not+return+from+a+computational+exception+signal+handler) – Filipe Gonçalves Jun 16 '15 at 18:30
  • @tmyklebu: For instance "division by zero", overflow/underflow, etc. – too honest for this site Jun 16 '15 at 18:30
  • This is the only true correct answer here. Why is this being downvoted? – Filipe Gonçalves Jun 16 '15 at 18:33
  • @FilipeGonçalves: So don't return. – tmyklebu Jun 16 '15 at 18:35
  • I think the answer should be reinforced by the info contained in the comments. – Eugene Sh. Jun 16 '15 at 18:35
  • @EugeneSh.: I truely did not suspect you to be the downvoter (tnaks for the upvote). I just hope I made the picture clear. The first paragraph is mostly an illustration and reasoning what happens and why. I'm not sure I got the right prases to make it clear, so your comments are welcome. – too honest for this site Jun 16 '15 at 18:52
  • @tmyklebu: Yes, that is the only proper conclusion. – too honest for this site Jun 16 '15 at 18:54
  • @FilipeGonçalves: Annex H actually contemplates trap-and-resume behaviour in floating-point traps. I suspect you're basing everything on some careless wording in 7.14.1.1. – tmyklebu Jun 16 '15 at 19:48
  • @tmyklebu Yes, you are right. I think it's a small glitch in 7.14.1.1 - it should have a reference to annex H. – Filipe Gonçalves Jun 16 '15 at 19:50
  • @tmyklebu I updated my answer. That's likely my last word about this. As I say: floating point is nasty. Always feel somewhat dirty after working with them. – too honest for this site Jun 16 '15 at 20:29
3

I think you're supposed to mess around with the calling stack frame if you want to skip an instruction or break out of exp or whatever. This is high voodoo and bound to be unportable.

The GNU C library lets you use setjmp() outside of a signal handler to which you can longjmp() from inside. This seems like a better way to go. Here is a self-contained modification of your program showing how to do it:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <fenv.h>

sigjmp_buf oh_snap;

void overflow_handler(int signal_number) {
    if (feclearexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID)){
        fprintf(stdout, "Nothing Cleared!\n");
    }
    else{
        fprintf(stdout, "All Cleared!\n");
    }
    siglongjmp(oh_snap, 1);

    return;
}

int main(void) {
    int failure;
    float oops;
    failure = 1;
    failure = feenableexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID);
    if (failure){
        fprintf(stdout, "FE ENABLE EXCEPTIONS FAILED!\n");
    }
    signal(SIGFPE, overflow_handler);
    if (sigsetjmp(oh_snap, 1)) {
      printf("Oh snap!\n");
    } else {
      oops = exp(-708.5);
      fprintf(stdout, "Oops: %f\n", oops);
    }
    return 0;
}
tmyklebu
  • 13,915
  • 3
  • 28
  • 57
  • The signal handler does not return. – tmyklebu Jun 16 '15 at 18:36
  • 1
    @FilipeGonçalves: What part of "return" is not clear to you? This is *what traps in the floating-point standard are for*. – tmyklebu Jun 16 '15 at 18:38
  • I personally could use this answer, though, I think Eugene illustrated why it is happening and provides an alternate solution. I understand that it is 'bad practice' but I have to do something. Right now, I am not sure what that is, so the more options the better. And yes, I am fully aware that any/all of these may not be portable. That is okay. – The Dude Jun 16 '15 at 18:41
  • Fair enough. I had actually never thought about the `setjmp` / `longjmp` combination to avoid *returning*, which is what causes UB. So, +1. But it's still something I wouldn't do in the interest of code maintenance. I'm sure the OP could come up with a better design to avoid using such a "hack". – Filipe Gonçalves Jun 16 '15 at 18:43
  • @TheDude Hey, please don't. Spaceships are falling down because of these things. Rewrite you code such that it won't cause these exceptions. – Eugene Sh. Jun 16 '15 at 18:44
  • @Eugene Haha yea I understand that. Don't worry, something like this certainly doesn't pass production code muster, but, I need options. Like I said, maybe I can use this for testing and collect all the random number generator states that cause these errors to occur, and discard them once they pop up. Then I will need something else to handle the case where one pops up that I didn't catch before...eh, well...nothing in life is perfect. Though maybe more careful handling of FPE is needed. I am not sure yet. – The Dude Jun 16 '15 at 18:47
  • @EugeneSh.: Traps in the floating-point standard are there so that special handling can happen for overflows, underflows, and inexact operations. People used to write careful code that made nontrivial use of trap handling; you can probably dig up papers with Kahan as an author talking about it. Floating-point traps get mapped to OS signals on Linux, and SIGFPE is the way you handle them. – tmyklebu Jun 16 '15 at 18:49
  • @EugeneSh.: I guess you could read that paragraph in the C standard about "returning" from the handler more broadly to include implementation-defined hacks like longjmp'ing from a handler. But, just as people safely write SIGSEGV handlers to implement write barriers for garbage collection, people can also safely write SIGFPE handlers to deal with floating-point traps in an appropriate way. – tmyklebu Jun 16 '15 at 18:50
  • @tmyklebu: Any _implementation defined_ hack would be subject to optimization settings, compiler version, or even depend on a specific code sequence. That is just crap. And, no you cannot re-interpret the standard here. That is very explicit clearly not _implementation defined_, but _undefined_. – too honest for this site Jun 16 '15 at 19:03
  • @Olaf: It's not crap. My signal handler does not return; it relies on implementation-defined behaviour that will be present in some form anywhere that wants floating-point traps to be useful and generates a SIGFPE when a trap occurs. – tmyklebu Jun 16 '15 at 19:12
  • @tmyklebu: does it also include to behave well after UB was exhibited? Especially fiddling with stack frames, etc. is likely to be broken after UB. Just think about function inlining, etc. – too honest for this site Jun 16 '15 at 19:34
  • @Olaf: It's unfortunate that floating-point traps cause a SIGFPE, which is supposed to be for *fatal* arithmetic errors. Floating-point traps, if a handling mechanism exists, are *non-fatal*. Have a look at Annex H, section H.3.1 of C99. That explicitly allows trap-and-resume. H.3.1.2 point 3 specifies that SIGFPE is the signal you use for traps. None of this says that traps are UB. All the standard says is that it's UB to *return* from the SIGFPE handler that handles a floating-point trap---and I think even that is a mistake in wording, since returning is how you'd do trap-and-resume. – tmyklebu Jun 16 '15 at 19:43
  • 2
    You should probably use `sigsetjmp/siglongjmp`. – ninjalj Jun 16 '15 at 20:54
  • @ninjalj: Hmm. Can you explain why? (What goes on with the signal mask when a signal handler is invoked?) – tmyklebu Jun 16 '15 at 21:21
  • 2
    When using `signal()`, it's anyone's guess (BSD vs sysV). BSD behavior would supposedly be to block the signal while the handler is executing. Since you never return, the signal may never be unblocked. You may also unblock it via `sigprocmask()` – ninjalj Jun 16 '15 at 21:58
  • @ninjalj: What a mess. Thanks for the info. – tmyklebu Jun 16 '15 at 22:06