Yesterday I asked a question about why I was losing accuracy in a floating point arithmetic. I received an answer about how it was due to intermediate results being held in x87 registers. This was helpful but some of the details are still escaping me. Here's a variation of the program I presented in the previous question, I'm using VC++ 2010 Express in debug mode.
int main()
{
double x = 1.8939201459282359e-308; /* subnormal number */
double tiny = 4.9406564584124654e-324; /* smallest IEEE double */
double scale = 1.6;
double temp = scale*tiny;
printf("%23.16e\n", x + temp);
printf("%23.16e\n", x + scale*tiny);
}
This outputs
1.8939201459282369e-308
1.8939201459282364e-308
The first value is correct according to the IEEE standard. Giving the scale
variable a value of 2.0 gives the correct value for both computations. I understand that temp
in the first calculation is a subnormal value and therefore loses precision. I also understand that the value of scale*tiny
is held in a x87 register which has a greater exponent range and so this value has more precision than temp
. What I don't understand is when adding the value to x
we get the correct answer from the lower precision value. Surely if a lower precision value can give the correct answer then a higher precision value should also give the correct answer? Is this something to do with 'double rounding'?
Thanks in advance, this is a whole new subject for me, so I'm struggling a little.