Error due to limited precision of float and double

Question

In C++, I use the following code to work out the order of magnitude of the error due to the limited precision of float and double:

 float n=1;
 float dec  = 1;

 while(n!=(n-dec)) {
    dec = dec/10;
 }
 cout << dec << endl;

(in the double case all I do is exchange float with double in line 1 and 2)

Now when I compile and run this using g++ on a Unix system, the results are

Float  10^-8
Double 10^-17

However, when I compile and run it using MinGW on Windows 7, the results are

Float  10^-20
Double 10^-20

What is the reason for this?

Something tells me that MinGW is storing the intermediates of `n!=(n-dec)` in 80-bit extended precision. `10^-20` is about the epsilon of 80-bit FP... — Mysticial, Oct 09 '11 at 08:04

score 2 · Answer 1 · answered Oct 09 '11 at 08:09

2

I guess I'll make my comment an answer and expand on it. This is my hypothesis, I may be wrong.

MinGW on Windows is probably trying to preserve precision by promoting the intermediates of expressions to the full 80-bit precision of x86.

Therefore, both sides of the expression n != (n-dec) are evaluated to 64-bits of precision (80-bit FP has a 64-bit mantissa).

2^-64 ~ 10^-20

So the numbers make sense.

Visual Studio also (by default), will promote intermediates. But only up to double-precision.

answered Oct 09 '11 at 08:09

Mysticial

464,885
45
335
332

The optimization flags used in GCC will likely also alter the output, as it will change the way it handles the floating point numbers. Also note, it isn't just the compiler here, but the chip itself and what floating point mode it is using. Compilers may also generate special truncation commands for floating point. Many variables, but yes, your answer is essentially correct. – edA-qa mort-ora-y Oct 09 '11 at 08:40
@edA-qa mort-ora-y: Agreed. I thought about this a bit more. All that's needed is for the FP mode to be in extended precision. The code is also small enough to not spill x87 registers either. So rounding upon a store won't happen. – Mysticial Oct 09 '11 at 08:44

score 0 · Answer 2 · answered Oct 09 '11 at 08:04

0

Why dont you check the size of float and double in both os?

answered Oct 09 '11 at 08:04

suresh

1,109
1
8
24

On W7 that would be float = 32bit, double = 64bit (using `sizeof()`). How do I relate that to my findings from above? – Ben Oct 09 '11 at 08:17

score 0 · Answer 3 · answered Oct 09 '11 at 08:23

0

This simply shows that the different environments use different sizes for float and double.

According to the C++ specification, double has to be at least as large as float. If you want to find out just how large the types are on your system, use sizeof.

What your tests seem to indicate is that g++ uses separate sizes for float and double (32 and 64 bits respectively) while MinGW32 on your Windows system uses the same size for both. Both versions are standard conforming and neither behaviour can be relied upon in general.

answered Oct 09 '11 at 08:23

Agentlien

4,996
1
16
27

See my comment on suresh's post below. The sizes used on my Windows system for float and double are different. – Ben Oct 09 '11 at 08:30
@Agentlien: On both systems, `float` is 4 bytes and `double` is 8 bytes. With optimisation turned off, MinGW behaves just like Unix. – TonyK Oct 09 '11 at 09:27

Error due to limited precision of float and double

3 Answers3

Linked