2

If a language wished to offer consistent floating-point semantics on both x87 hardware and on hardware that supports the binary128 type, would existing binary128 implementations be able to operate efficiently with rules that required all intermediate results to be rounded equivalent to the 80-bit type found on the x87? Although the x87 cannot efficiently operate with languages which require results to be evaluated at the equivalent of float or double precision because those types have different exponent ranges and thus different behavior with denormalized values, it would appear that both binary128 and binary80 use the same size exponent field, and thus rounding off the bottom 48 bits of significant should yield consistent results throughout the type's computational range.

Would it be reasonable for a language design to assume that future PC-style hardware will either support the 80-bit type via x87 instructions or via an FPU that could emulate the behavior of the 80-bit type even if values required 128 bits to store?

For example, if a language defined types:

  • ieee32 == Binary32 that is not implicitly convertible to/from any other type except real32 or realLiteral
  • ieee64 == Binary64 that is not implicitly convertible to/from any other type except real64 or realLiteral
  • real32 == Binary32 that eagerly converts to realComp for all calculations, and is implicitly convertible from all real types
  • real64 == Binary64 that eagerly converts to realComp for all calculations, and is impliticly convertible from all real types
  • realComp == Intermediate-result type that takes 128 bits to store regardless of the precision stored therein
  • realLiteral == Type of non-suffixed floating-point literals and constant expressions; processed internally as maximum-precision value, but only usable as a type for literals and constant expressions; stored as maximum precision except in cases where it would be immediately coerced to a smaller type, in which case it would be stored as the destination type.

would it be reasonable for the language to provide semantics that would promise that realComp would always be processed as exactly 80-bit precision, or would such a promise be likely to pose an execution-time penalty on some platforms? Would it be better to simply specify it as being 80 bits or better, with a promise that any platform which sometimes has 128 bits of precision will do so consistently? What should one try to promise on hardware which has an exactly 64-bit FPU (on a typical 16- or 32-bit micro without a 64-bit FPU, computations on realComp would be faster than on double)?

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    Use of "real" in naming finite precision data types is very worrying- it has had a history, going back to the early days of Fortran, of encouraging people to expect real number arithmetic. The single most important thing to understand and remember about floating point is the it has its own arithmetic rules. For example, real number addition is associative. Floating point addition is not. – Patricia Shanahan Oct 22 '14 at 19:01
  • @PatriciaShanahan: Perhaps the naming represents my heritage as a Pascal programmer showing through (its primary FP type is `Real`) but I would suggest that a `real32` endeavors to be, semantically, "the binary32 value which best estimates a particular real number". If I say `real32 r=999999.9;` that shouldn't be taken to mean that I want `r` to equal precisely 999999.875, but rather that I want it to hold the best possible representation of the exact quantity 999999.9 that can be stored precisely into an `ieee32` (which happens to be 999999.875). – supercat Oct 22 '14 at 20:21
  • If you think some other term would be better, while still making clear the fact that values should auto-promote to the biggest practical hardware type, feel free to suggest one. There are times when it's important to have the type of a numerical result precisely match that of the operand (which is what should happen with ieee32 and ieee64 types), but in most usage cases it would be more helpful to perform intermediate computations as accurately as possible (provided there is a *precise* way to store intermediate results!). – supercat Oct 22 '14 at 20:26
  • 1
    @PatriciaShanahan: Incidentally, in many languages, addition is even more non-associative than floating-point. In Java, for example, 2000000000+2000000000+1L == -294967296, but 2000000000+(2000000000+1L) == 4000000001L. Note that in the first situation, the first addition isn't "trying" to compute 4000000000 but failing; it's performing in exactly the manner that `int` arithmetic is defined as behaving. – supercat Oct 22 '14 at 20:37

1 Answers1

1

Although the x87 cannot efficiently operate with languages which require results to be evaluated at the equivalent of float or double precision because those types have different exponent ranges

This is one way to see the situation, especially if you are willing to renounce extended precision and change the x87 FPU control word to round the significand to 53 or 24 bits. There is no way to tell the x87 FPU to limit the range of the exponent by changing the control word, so the exponent aspect of extended-precision ends up being the thorn in the proverbial side. You have to deal with the exponent by conditioning the operands so that extended-precision conditioned denormals match up with standard-precision unconditioned denormals. This blog post of mine also discusses an implementation of unboxed floats that amounts to solving an extra exponent width problem.

If you are unwilling to give up on easily-accessible extended-precision, say through a long double type that programs can mix up freely with float and double types, the situation is reversed: exponents, if they were the only problem, you could still deal with as above using a couple of extra instructions The significand, on the other hand, introduce a double-rounding problem that simply cannot be dealt with inexpensively(*).

Figueroa's thesis shows that basic IEEE 754 operations can be emulated relatively easily with twice the significand size (double-rounding is “innocuous”). This is the root of the problem for 80 (64-bit significand) -> 64 (53-bit significand), and it would be a problem too for 128 (113-bit significand) -> 80 (64-bit significand).

But since 128-bit quad-precision is not implemented much in hardware nowadays, the question is rather moot. A hardware implementation of quad-precision for the rest of us could be designed to allow perfect emulation of 80-bit double-extended, with or without changing a control word (and binary32 and binary64 could be emulated relatively easily even if the hardware implementation was uncooperative).

In SSE2, there are dedicated instructions for binary32 and binary64, and some of us are very happy with this situation, as it gives no excuse for compilers to provide anything other than C99 FLT_EVAL_METHOD=0 semantics (this is my conclusion in this blog post). Those of us who want to statically analyze programs quite like FLT_EVAL_METHOD=0 for the clarity, even if extended-precision for intermediate results has slightly better properties from the numerical point of view.

(*) I should repeat here a comment that I already posted at the bottom of one the pages this answer refers to: I am pretty sure that someone once gave me a reference to a study of the exact emulation of binary64 basic operations with only an x87 configured for 64-bit significands. I would very much like to look at this reference again if anyone knows which document this might be.

Community
  • 1
  • 1
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • In what way is FLT_EVAL_METHOD==0 superior to FLT_EVAL_METHOD==2? Kahan's paper http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf would seem to suggest that the latter is semantically superior in essentially every way. There is huge value in having a temporary working type which is somewhat beyond the "main" precision one uses, but successive bits are worth less and less. I find it interesting that people worry about the fact that it might increase the worst-case round off of `d=d1+d2;` by 0.001ulp, but don't care FLT_EVAL_METHOD==0 computation of `d=d1+d2+d3;`... – supercat Oct 23 '14 at 14:56
  • ...has a worst-case round-off performance that's almost 0.25ulp worse than FLT_EVAL_METHOD==2 when all values are of the same sign, and over 1000ulp worse when values of of mixed sign. I guess if having a 128-bit ALU would ease people's fears of "double-rounding" that would be worth something, but otherwise it seems a waste. – supercat Oct 23 '14 at 14:58
  • @supercat: Most of the bad rap that FLT_EVAL_METHOD==2 gets is due to historical incorrect implementations that spilled temporaries to memory in binary32 or binary64, leading to nastiness such as results changing under different optimization levels, seemingly "identical" computations producing different results, and other forms of unpredictability. The good reason to avoid it today is simply that FLT_EVAL_METHOD==0 allows significantly better performance on common hardware. – Stephen Canon Oct 23 '14 at 15:02
  • @StephenCanon: For code which doesn't need precision, having non-promoting floating-point types could be useful for performance. On the other hand, if code does need precision, I would think having types available that would auto-promote to a slightly-oversized type would in many cases be more performant than all the kludges which would otherwise be necessary to achieve that accuracy. – supercat Oct 23 '14 at 15:13
  • @StephenCanon: It makes me sad that compiler/language vendors have botched the floating-point for so long that chip vendors see little need for good floating-point, *especially since FLT_EVAL_METHOD==2 semantics are easier to implement than anything else*. Correct me if I'm wrong, but I think an implementation could legitimately store all non-prefixed constants as binary80, make all floating-point types used as varargs push as binary80, make va_arg of any floating-point type retrieve a binary80 from the stack, and not bother with any support for `float` and `double` beyond... – supercat Oct 23 '14 at 15:44
  • ...the ability to load and store them, round literals that contain prefixes, and create initialized data structures containing them. I would think using `long double` for everything would be *easier* for a compiler than trying to decide when it should decide when it can get away with spilling a calculation as a binary32. – supercat Oct 23 '14 at 15:50
  • @StephenCanon I would add that the rules of FLT_EVAL_METHOD>0 are difficult to remember for programmers even when they are implemented correctly (although I concede most programmers will only want to remember that the rules have been chosen for the best results). I had to implement an option `-all-rounding-modes-constants` to capture extended-precision constants separately from “FLT_EVAL_METHOD=0 but the program changes the rounding mode”, and I think C11 had to clarify that returning a floating-point value did not round it, which C99 did not make clear. Simplicity in language definitions FTW. – Pascal Cuoq Oct 23 '14 at 17:29
  • C11 clarifies that returning a floating-point value *does* round it (assuming adherence to Annex F): "the expression is converted as if by assignment to the return type of the function and the resulting value is returned to the caller." – Stephen Canon Oct 23 '14 at 17:53
  • 1
    @StephenCanon My point exactly. Wait, are you sure you didn't miss the clause elsewhere in the standard that says that this conversion does not count as a conversion for the purposes of `FLT_EVAL_METHOD`? (And I am, unfortunately, asking this as a serious question.) – Pascal Cuoq Oct 23 '14 at 17:54
  • @supercat: 80-bit is really in no-mans-land. It's not accurate enough to enable more efficient algorithms where extreme precision is required (you really want quad there), but binary64 is plenty accurate for almost all uses and much, much faster (especially with vectorization). – Stephen Canon Oct 23 '14 at 18:00
  • 1
    @PascalCuoq Footnote 362 further clarifies: "Assignment removes any extra range and precision." – Stephen Canon Oct 23 '14 at 18:02
  • 1
    @StephenCanon: The paper I linked suggests that extra bits have diminishing usefulness, but the first few extra bits have enormous value. If SSE were 10 times as fast as x87, could it outpace it computing `d=d1*d2+d3*d4` with all positive `double` operands, even if the only requirement was that the result be strictly within 0.75ulp [allowing a full quarter LSB of excess rounding slop]? – supercat Oct 23 '14 at 18:35
  • @StephenCanon: Simple case: 0x10000000000001*3.0 + 0x18000000000001*6.0; a perfect result would be 54043195528445961 (0xC0000000000009); the closest double is 54043195528445960, but simple evaluation yields 54043195528445968, an error of 0.875ulp. – supercat Oct 23 '14 at 18:41
  • @PascalCuoq: The fact that a particular language spec was written by people who were trying to avoid requiring behaviors contrary to what existing implementations already did should not be regarded as a condemnation of x87-style semantics. I would posit that C should define multiple new types for each numeric storage format, with clear and explicit rules for promotion, rounding (for floating-point types), and wrapping (for discrete types). Having a type `wrap32_t` which behaved much like `uint32_t` but would *not* promote when added to a number, and `wholenum32_t` which... – supercat Oct 23 '14 at 19:00
  • ...would promote, would make it safe to use a 64-bit `int` on any code which had appropriately migrated `uint32_t` to `wrap32_t` and `wholenum32_t`. Likewise, distinct promoting and non-promoting floating-point types would let code which cared about floating-point semantics specify what it actually wants (compiler args could then specify whether `float` and `double` should be tight, wide, or loosy-goosy; programs which care about float precision in a few spots but mostly want speed could use promoting types where needed, and loosy-goosy elsewhere). – supercat Oct 23 '14 at 19:04
  • @supercat: The trouble is that the requirement is essentially *never* "0.75 ulp" for any non-trivial usage. 990 out of 1000 times, it's something like "10 decimal digits" (i.e. ~a million ulps). 9 times it's correctly rounded or exactly-controlled error. 80-bit only helps in the one leftover case (which is usually library implementations, and library writers can use specialized tools). – Stephen Canon Oct 23 '14 at 19:17
  • @supercat: "If SSE were 10 times as fast as x87 ...?" Yes, absolutely. It's something along the lines of 2xMUL, 2xFMA, 3xADD, and it's vectorizable, so in many usages it's faster on the hardware we have today. (Note: extended types are quite useful to library writers, but only if they don't have to write library functions for the extended type as well. If they're a hidden implementation detail, they're great.) – Stephen Canon Oct 23 '14 at 19:27
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/63560/discussion-between-supercat-and-stephen-canon). – supercat Oct 23 '14 at 19:35
  • @PascalCuoq: When you say there's no excuse for anything other than `FLT_EVAL_MODE==0`, what about an expression like `f=f1+f2*f3;`? I think SSE has a fused-multiply-add which would behave like `f=f1+((double)f2*f3);`, but would be much faster than `f=f1+(float)(f2*f3);`. I would imagine that there are a lot of cases where programmers would like the speed that can be obtained via fused-multiply-add and the semantics would be fine, so that would seem like a pretty good "excuse" for compilers to not use FLT_EVAL_MODE==0 semantics. – supercat Oct 24 '14 at 18:24
  • @supercat `FLT_EVAL_METHOD>0` does not allow `f1+f2*f3` to be computed by FMA. Either **all** expressions are evaluated to the precision of `long double`, or none of them are (as opposed to some with infinite precision and some with the precision of the type). `FLT_EVAL_METHOD<0` allows anything, but compilers could already define it so before SSE2, so if they did not before SSE2 for whatever reason, that reason must still be just as compelling with it. Note that the programmer can allow a*b+c to be compiled to FMA with `FP_CONTRACT`, but this has little to do with your question. – Pascal Cuoq Oct 24 '14 at 19:04
  • @PascalCuoq: I interpreted your statement as saying that once x86 is abandoned there would no longer be any performance advantages for compilers which use `FLT_EVAL_METHOD<0` semantics. A compiler which automatically used FMA couldn't claim `FLT_EVAL_METHOD==0`, but could offer better performance in many cases than one which didn't; I'd think it likely that some vendors would value speed over consistency. – supercat Oct 24 '14 at 20:04