1

I am aware that real numbers cannot be represented exactly in binary (even though with so-called double precision) in most cases. For example, 1.0/3.0 is approximated by 0x3fd5555555555555, which actually represents 0.33333333333333331483.... If we perform (1.0/3.0)+(1.0/3.0) then we obtain 0x3fe5555555555555 (so 0.66666666666666662965...), just as expected in a sense of computer arithmetic.

However, when I tried to perform (1.0/3.0)+(1.0/3.0)+(1.0/3.0) by writing the following code

#include<stdio.h>
int main(){
    double result=1.0/3.0;
    result+=1.0/3.0;
    result+=1.0/3.0;
    printf("%016llx\n",result);
}

and compiling it with the standard GNU C compiler, then the resulting program returned 0x3ff0000000000000 (which represents exactly 1). This result made me confused, because I initially expected 0x3fefffffffffffff (I did not expect rounding error to cancel each other because both (1.0/3.0) and ((1.0/3.0)+(1.0/3.0)) are smaller than actual value when represented in binary), and I still have not figured out what happened.

I would be grateful if you let me know possible reasons for this result.

Katie Imach
  • 185
  • 5
  • 2
    *"real numbers cannot be represented exactly in binary"* This is not unique to binary system. Decimal system also cannot display some numbers without infinite precision. – user694733 Jan 04 '18 at 13:01
  • Re "*even though with so-called double precision*", There's nothing "so-called" about double-precision. Single-precision floats have 24 bits of precision, and double-precision floats have a little more than twice that with 53 bits of precision. – ikegami Jan 04 '18 at 13:24
  • Since this is a Q&A forum I would be grateful if you post comments so that they add something significant to solve the issue raised here or to clarify the question, although I do consider deleting the first sentence if it leads to unnecessary confusion. – Katie Imach Jan 04 '18 at 13:35
  • It just isn't as much off the mathematical value in base 2 as it is in base 10. Something you can see by adding printf("%.19f\n", result); after the variable assignment. You'll see 0.33...33148 and not 0.33...33000. Rounding still gets it to 1.0. Simply add it 5 times to make it accumulate enough to get noticeable. – Hans Passant Jan 04 '18 at 13:41
  • Your code has undefined behaviour, since the format string specifies that the argument being formatted is of type `long long unsigned` and the argument is actually of type `double`. When behaviour is undefined, explanations for what happens are implicitly compiler-dependent. Use a format that specifies a type of `double`. – Peter Jan 04 '18 at 13:57
  • I was not aware of that and I sincerely apologize. FYI, if we replace that phrase with standard "%.20lf", it returns 1.00000000000000000000 as mathematically expected. – Katie Imach Jan 04 '18 at 14:05

2 Answers2

2

There is no need to consider 80 bit representation - the results are the same in Java which requires, except for some irrelevant edge cases, the same behavior as IEEE 754 64-bit binary arithmetic for its doubles.

The exact value of 1.0/3.0 is 0.333333333333333314829616256247390992939472198486328125

As long as all numbers involved are in the normal range, multiplying or dividing by a power of two is exact. It only changes the exponent, not the significand. In particular, adding 1.0/3.0 to itself is exact, so the result of the first addition is 0.66666666666666662965923251249478198587894439697265625

The second addition does involve rounding. The exact sum is 0.99999999999999988897769753748434595763683319091796875, which is bracketed by representable numbers 0.999999999999999944488848768742172978818416595458984375 and 1.0. The exact value is half way between the bracketing numbers. A single bit has to be dropped. The least significant bit of 1.0 is a zero, so that is the rounded result of the addition.

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75
  • Thank you very much for your answer. Could you let me confirm whether my understanding is correct? This is my current understanding: The exact result of the second addition is 0x3fefffffffffffff8 (17 digits in hex) in binary representation. Since exact representation has more digits than 64 in binary representation, rounding has to be performed, and the exact result is *exactly* half way between 0x3fefffffffffffff and 0x3ff0000000000000. Hence rounding is performed so that the least significant bit is 0 by default, giving 0x3ff0000000000000 (1.0 in dicimal representation). Is it correct? – Katie Imach Jan 05 '18 at 02:47
  • Thank you. Maybe I should consult with other existing Q&As in SO, but I have an additional question: How does a computer know whether the exact result is closer to either of the two candidates or is exactly half way between them? Does it use some arithmetic coprocessor which internally uses more digits than representable, as Serge points out? – Katie Imach Jan 05 '18 at 03:03
  • @KatieImach it is indeed a separate question, and one that I believe has already been answered. – Patricia Shanahan Jan 05 '18 at 07:53
1

That is a good rounding question. If I correctly remember, the arithmetic coprocessor uses 80 bits: 64 precision bits and 15 for the exponent (ref.). That means that internally the operation uses more bits than you can display. And in the end the coprocessor actually rounds its internal representation (more accurate) to give a 64 bit only value. And as the first bit dropped is 1 and not 0, the result is rounded upside giving 1.

But I must admit I am just guessing here...

But if you try to do by hand the operation, if immediately comes that the addition sets all precision bits to 1 (adding 5555...5 and 555...5 shifted by 1) plus the first bit to drop which is also 1. So by hand a normal human being would round upside also giving 1, so it is no surprise that the arithmetic unit is also able to do the correct rounding.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thank you very much for your answer. Could you let me confirm whether my understanding is correct? This is my current understanding: When we perform (1.0/3.0)+(1.0/3.0), the resulting mantissa in the arithmetic coprocessor is 0x555...5 (16 digits in hex, 64 bits), and the 53rd bit, which is the first to drop, is 0. Hence the result is rounded downside when the result is displayed. Conversely when we perform (1.0/3.0)+(1.0/3.0)+(1.0/3.0), the mantissa is 0xfff...f and the first bit to drop is 1 hence the result is rounded upside to yield exactly 1. Is it correct? – Katie Imach Jan 04 '18 at 14:15
  • @KatieImach: As I already said, I am just guessing here. But that's what I think is correct... – Serge Ballesta Jan 04 '18 at 14:18
  • In that case I will accept your answer, although I have to look into some documentations to convince myself about this. Thank you so much! – Katie Imach Jan 04 '18 at 14:22