Adding long doubles gives the wrong answer in C++

Question

I have a section of code that reads:

std::cerr << val1 << " " << val2 << std::endl;
val1 = val1 + val2;
std::cerr << val1 << std::endl;

Both val1 and val2 are long double.

The problem comes from the fact that the result of this is:

-5.000000000000722771452063564190e-01 2.710505431213761085018632002175e-20
-5.000000000000722771452063564190e-01

Which doesn't make sense. It appears that val2 is NOT being added to val1, however, there is obviously enough information in the fractional part of val1 that val2 could be added to it.

I'm stumped, anyone have any ideas?

I'm using GCC 4.2 I believe. Does G++ use the IEEE quadruple-precision format? Or something else (like the 80 bit extended precision, which would explain this problem (though why do more than 18 decimal places show up then?).

Obligatory Goldberg link: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html — Robᵩ, Aug 31 '12 at 16:55

perilbrain · Answer 1 · 2012-09-01T17:15:43.340

5

If your val1 and val2 are printed correctly then the output is correct:-

-5.000000000000722771452063564190e-01 = -5.000000000000722771452063564190 X e^(-1)  //or 10^(-1)

where ^ denotes to the power of

2.710505431213761085018632002175e-20 =  -5.000000000000722771452063564190 X e^(-20)  //or 10^(-20)

..

Since val1 >> val2 
=> lim (val2/val1 -> 0) (lim is mathematical limit) .... eq (A)

Consider y=val1+val2
=> y= ((val1+val2)/val1)*val1  (rationalizing)
=> y= {(val1/val1)+(val2/val1)} * val1
=> y= {1+val2/val1}*val1
=> y= {1+0}*val1 .........................................From eq (A)
=> y= val1

thats why output is -5.000000000000722771452063564190e-01 (because the difference produced by addition falls out of the range of representation by binary long double format)

edited Sep 01 '12 at 17:15

answered Aug 31 '12 at 16:37

perilbrain

7,961
2
27
35

I think that `e-01` in fact means `* 10 ^ (-01)`. – Matthieu M. Aug 31 '12 at 17:23
@MatthieuM: it does. For Anon ymous Also, If you notice, the number of digits of precision available in quadruple-precision floating point is ~ 35 decimal digits (far greater than the 19 orders of magnitude difference between the two numbers), which means that they should be added. However, as I mentioned in my answer, in G++, long double are computed with extended precision numbers (with approximately 19 digits of precision). Also, you should check your math... it doesn't make much sense (you are dividing by zero in there, among other mistakes). – Andrew Spott Aug 31 '12 at 21:45

score 4 · Accepted Answer · answered Aug 31 '12 at 16:33

4

Well, I should have guessed... it looks like long double on G++ is stored as a quadruple-precision format, but computed using a 80 bit extended precision format. So, it will give lots of digits, but only some of those are computed.

answered Aug 31 '12 at 16:33

Andrew Spott

3,457
8
33
59

Adding long doubles gives the wrong answer in C++

2 Answers2