-4

When I run

printf("%.8f\n", 971090899.9008999);
printf("%.8f\n", 9710908999008999.0 / 10000000.0);

I get

971090899.90089989
971090899.90089977

I know why neither is exact, but what I don't understand is why doesn't the second match the first?
I thought basic arithmetic operations (+ - * /) were always as accurate as possible...
Isn't the first number a more accurate result of the division than the second?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 7
    Erm. How is this not a dupe of our (many) floating point faq items? (In short: it's not decimal arithmetic. Not even at compiletime) – sehe Feb 01 '14 at 01:46
  • Yeah, you surpasses the float type precision. You must use Double values instead. – hmartinezd Feb 01 '14 at 01:48
  • 2
    @hmartinezd completely missing the point of float inexact representations. Also, the constant expression **is** double, not float. – sehe Feb 01 '14 at 01:49
  • @sehe: Which duplicates/FAQs are you referring to? I may very well have missed one (it seems like it should be a duplicate), but with my brief searching I haven't seen any that discusses this yet. I even looked [here](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#704), but what I gathered from it was that basic arithmetic operations should be correctly rounded, which doesn't match what I'm seeing. – user541686 Feb 01 '14 at 01:49
  • 2
    [What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) – Borgleader Feb 01 '14 at 01:51
  • @Borgleader: [Thanks for not reading the last comment where I linked to the very same thing you linked to -- and in fact also told you why it doesn't help.](http://stackoverflow.com/questions/21493102/why-dividing-a-float-by-a-power-of-10-is-less-accurate-than-typing-the-number-di?noredirect=1#comment32444200_21493102) – user541686 Feb 01 '14 at 01:52
  • 1
    @Mehrdad why does it not help? Why is it "not rounding correctly"? Are you forgetting about the differences between decimal and binary representations? – sehe Feb 01 '14 at 01:54
  • @sehe: Because `9710908999008999.0` and `10000000.0` are both represented *exactly*, and the infinite-decimal expansion of the result is `971090899.9008999`. If IEEE-754 requires correct rounding of basic arithmetic to the nearest binary number -- which I believe it does -- then the result should match what I get when I type the result directly. Where am I going wrong with my reasoning? – user541686 Feb 01 '14 at 01:56
  • @Mehrdad It's worth noting that if you have a reason to think that something should work for these specific numbers (as opposed to arbitrary numbers in the general case), you'll get much closer attention if you mention that in the question. Not everyone will read nine comments down… – Joshua Taylor Feb 01 '14 at 02:02
  • @JoshuaTaylor: Thanks, that's good advice. I thought `9710908999008999.0` was representable exactly... didn't even occur to me that it might round to `9710908999008998.0`. Thanks. – user541686 Feb 01 '14 at 02:04
  • 2
    @Mehrdad: 9710908999008999.0 rounds to 9710908999009000, not 9710908999008998. – Eric Postpischil Feb 01 '14 at 02:55
  • @Mehrdad Confident C _allows_ a higher precision at compile time such that the 2 raw constants created for the `printf()` _could_ be the same - or even more divergent with lower precision platforms. – chux - Reinstate Monica Feb 01 '14 at 03:05
  • @EricPostpischil: Uh no you're wrong. When I run `printf("%.3f\n", 9710908999008999.0);` I get `9710908999008998.000`. In fact what you said wouldn't make any sense because the discrepancy here is an underestimate, not an overestimate. – user541686 Feb 01 '14 at 03:29
  • 2
    @Mehrdad: The C implementation you are using is deficient. Obviously, 9710908999008998 and 9710908999009000 are equidistant from 9710908999008999 (and both are representable in `double`). Their significands are 0x1.1400298aa8173 and 0x1.1400298aa8174. The IEEE-754 rules for rounding equidistant values in the common round-to-nearest mode say to round to an even low bit. Thus, 9710908999009000 is correct. – Eric Postpischil Feb 01 '14 at 03:43
  • @Eric Postpischil Note: Came up with 9710908999009000.000 on Eclipse Indigo 64-bit PC. – chux - Reinstate Monica Feb 01 '14 at 03:51
  • @Mehrdad: Your C implementation does a bad job on the first value too. `971090899.9008999` ought to be converted to 971090899.90090000629425048828125, not 971090899.90089989. – Eric Postpischil Feb 01 '14 at 03:51
  • @EricPostpischil: I should note that my C(++) implementation is nothing other than Visual C++ 2013... I'm really surprised it would get something so simple wrong. – user541686 Feb 01 '14 at 03:56
  • 1
    @Mehrdad: Visual C++ is known not to conform to IEEE 754. I have presented mathematical evidence. There is no question about this. – Eric Postpischil Feb 01 '14 at 04:02
  • @EricPostpischil: Yeah I understand what you're saying, I'm just surprised. Thanks for pointing it out. – user541686 Feb 01 '14 at 04:23
  • @EricPostpischil: By the way, where have you seen it be "known" for VC++ to not be IEEE 754 compliant? When I look on [this page I see *"(see IEEE 754 and the C99 standard)"*](http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx)... which doesn't imply compliance, but it at least indicates attempts at compliance, and I haven't seen information to the contrary anywhere. Is the floating point model and perhaps the lack of IEEE 754 compliance documented somewhere? – user541686 Feb 01 '14 at 05:01

2 Answers2

10

Judging from the numbers you're using and based on the standard IEEE 754 floating point standard, it seems the left hand side of the division is too large to be completely encompassed in the mantissa (significand) of a 64-bit double.

You've got 52 bits worth of pure integer representation before you start bleeding precision. 9710908999008999 has ~54 bits in its representation, so it does not fit properly -- thus, the truncation and approximation begins and your end numbers get all finagled.

EDIT: As was pointed out, the first number that has no mathematical operations done on it doesn't fit either. But, since you're doing extra math on the second one, you're introducing extra rounding errors not present with the first number. So you'll have to take that into consideration too!

  • 2
    If you drop the rambling last paragraph, you'll have my upvote. The compiler is required to simply do the float math in the same domain. Doesn't have to do with runtime (also: that would be horrific if true :)) – sehe Feb 01 '14 at 02:01
  • Ooooh I totally missed that the number on the left was rounded to `9710908999008998.0`... that makes sense, thanks! – user541686 Feb 01 '14 at 02:02
  • 971090899.9008999 does not fit in a `double` significand either. Both 9710908999008999 and 971090899.9008999 are rounded by conversion to `double`; this answer does not explain why the rounding of one (followed by division by 10000000.0) produces a different result from the rounding of the other. – Eric Postpischil Feb 01 '14 at 02:53
  • @ThePhD Minor: I'd say 53 bits integer (not counting another 1 for the sign) with IEEE 754. Still 9710908999008999 needs 54 bits – chux - Reinstate Monica Feb 01 '14 at 03:08
  • @chux -- Eh, the assumed `1` leading the significand is mostly fixed anyhow: saying 52 is just fine. –  Feb 01 '14 at 03:26
  • @ThePhD The encoding details of an IEEE 754 double are not applicable. Whatever it encoding, it can represent all the unsigned integers that a `uint53_t` could (and all the integers a `int54_t` could.), hence the reasoning for asserting 53 (vs 52) bits worth of positive integer representation. – chux - Reinstate Monica Feb 01 '14 at 03:43
8

Evaluating the expression 971090899.9008999 involves one operation, a conversion from decimal to the floating-point format.

Evaluating the expression 9710908999008999.0 / 10000000.0 involves three operations:

  • Converting 9710908999008999.0 from decimal to the floating-point format.
  • Converting 10000000.0 from decimal to the floating-point format.
  • Dividing the results of the above operations.

The second of those should be exact in any good C implementation, because the result is exactly representable. However, the other two add rounding errors.

C does not require implementations to convert decimal to floating-point as accurately as possible; it allows some slack. However, a good implementation does convert accurately, using extra precision if necessary. Thus, the single operation on 971090899.9008999 produces a more accurate result than the multiple operations.

Additionally, as we learn from a comment, the C implementation used by the OP converts 9710908999008999.0 to 9710908999008998. This is incorrect by the rules of IEEE-754 for the common round-to-nearest mode. The correct result is 9710908999009000. Both of these candidates are representable in IEEE-754 64-bit binary, and both are equidistant from the source value, 9710908999008999. The usual rounding mode is round-to-nearest, ties-to-even, meaning the candidate with the even low bit should be selected, which is 9710908999009000 (with significand 0x1.1400298aa8174), not 9710908999008998 (with significand 0x1.1400298aa8173). (IEEE 754 defines another round-to-nearest mode: ties-to-away, which selects the candidate with the larger magnitude, which is again 9710908999009000.)

The C standard permits some slack in conversions; either of these two candidates conforms to the C standard, but good implementations also conform to IEEE 754.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312