Double and float may have the same precision?

Question

I have to implements a program that calculate the machine epsilon for float and double.
I wrote these functions:

int feps(){
    //machine epsilon for float
    float tmp=1;
    int d=0;
    while(1+(tmp=tmp/2)>1.0f)d++;
    return d;
}

int deps(){
    //machine epsilon for double
    double tmp=1;
    int d=0;
    while(1+(tmp=tmp/2)>1.0)d++;
    return d;
}

Note:
64 bit machine compiler gcc 4.9.1 target: x86_64-linux-gnu
32 bit machine compiler gcc 4.8.2 target: i686-linux-gnu

I tried it in a 64 bit machine and the results are:
Float 23
Double 52
As I expected, then I tried it in a 32 bit virtual machine and the results were very strange:
Float 63
Double 63
I also tried to compile my program with -mpc32, -mpc64 and -mpc80 and these are the result:
-mpc32 Float 23, Double 23
-mpc64 Float 52, Double 52
-mpc80 Float 63, Double 63
I also tried these compilation option in the 64 bit machine but the results were always 23 and 52.
I know that float are single precision and double are double precision but it's possible that the compiler of my 32 bit virtual machine use a binary80 format for both float and double?

I'm quite sure my code is correct so I think that the problem is something related with compiler or something more subtle.
I've spend the entire day searching information about floating point and I've read something about MMX/SSE instruction but I didn't understand a lot, and something about x87 FPU that may create some problem.

Update:
I want to thank everyone who helped me, I managed to get the real epsilon value for float and double in the 32 bit virtual machine, that's the code:

int feps(){
    float tmp=1;
    int d=0;
    float tmp2=1;
    do{
        tmp2=1+(tmp=tmp/2);
        d++;
    }while(tmp2>1.0f);
    return d-1;
}

int deps(){
    double tmp=1;
    int d=0;
    double tmp2=1;
    do{
        tmp2=1+(tmp=tmp/2);
        d++;
    }while(tmp2>1.0);
    return d-1;
}

as you can see we need to put the intermediate result into a variable, in that way we can prevent that 1+(tmp=tmp/2) is evaluated as a long double in the cycle test.

It is compiler and implementation specific. Read http://floating-point-gui.de/ — Basile Starynkevitch, Mar 15 '15 at 21:50
perhaps it's something to do with your virtual machine's processing of floating point instructions — M.M, Mar 15 '15 at 21:54
On your x86 machine [FLT_EVAL_METHOD](http://en.cppreference.com/w/c/types/limits/FLT_EVAL_METHOD) is 2, but on x86_64 it is 0. See the linked site for descriptions. — cremno, Mar 15 '15 at 21:55
Try `printf("%zu\n",sizeof(float));`. That won't tell you what epsilon is, but will tell you how many bytes a `float` has on the virtual machine. — user3386109, Mar 15 '15 at 22:00
@cremno the assignment should force the result to be converted to `float` so I don't quite see how that would be a complete explanation — M.M, Mar 15 '15 at 22:05
Try printing the `FLT_EPSILON` and `DBL_EPSILON` constants defined in `` on all these platforms. `float` and `double` may actually be the same type. — chqrlie, Mar 15 '15 at 22:17
I've tried the printf with %zu and in every case the result is 4 bytes for float and 8 bytes for double. — LolloFake, Mar 15 '15 at 22:17
@LolloFake: How do you compile your code? What options do you use? — cremno, Mar 15 '15 at 22:26
@cremno gcc es3.c for the 64 bit machine, for the 32 bit machine I've tried gcc -mpcXX es3.c where XX is 32,64 or 80. I've also printed FLT_EPSILON and DBL_EPSILON, the values are different. — LolloFake, Mar 15 '15 at 22:30
What *may* help is declaring `tmp` as `volatile`. This should prevent the variable from being "cached" in a register and can avoid issues with excess precision when computing on x87 FPUs. — njuffa, Mar 16 '15 at 01:45
You may be getting fooled by values being kept in a floating point register. — David Schwartz, Mar 16 '15 at 04:40

Pascal Cuoq · Accepted Answer · 2015-03-18T06:41:47.030

3

On the 32-bit platform, ABI constraints make it simpler to use historical floating-point registers; as a consequence, the compiler defines FLT_EVAL_METHOD as 2. This is how you get:

Float 63
Double 63

In short, when FLT_EVAL_METHOD is defined to 2 by the compiler, as is the case on your 32-bit virtual machine, floating-point expressions and constants are evaluated to the precision of long double, regardless of their types, and only assignments to lvalues and explicit casts round the computed values from long double to the actual floating-point type. There are no such constructs at toplevel of the expression 1+(tmp=tmp/2), so the addition is evaluated to the precision of long double.

This two-post series shows some examples on which FLT_EVAL_METHOD makes a difference in addition to yours. GCC's behavior is deterministic, and according to the explanation laid out by J.S.Myers. Clang's behavior is nondeterministic (then and now) and developers have little interest in improving this mode of their compiler.

edited Mar 18 '15 at 06:41

answered Mar 16 '15 at 04:29

Pascal Cuoq

79,187
7
161
281

`tmp` is an lvalue, so `tmp = tmp/2` is an assignment to lvalue, which should force the rounding. The result of an *assignment-expression* has the type of the left-hand operand (not the raw result of the right-hand-side) – M.M Mar 17 '15 at 22:54
@MattMcNabb Yes, but `float` has way enough range to represent `long double`'s machine epsilon (and indeed it does in the OP's program). The important thing is that `+` in `1+…` is a double-extended addition between the double-extended 1 and the conversion to double-extended of the value of `tmp`. – Pascal Cuoq Mar 18 '15 at 06:41
@MattMcNabb I have rephrased the sentence you were referring to. – Pascal Cuoq Mar 18 '15 at 06:42
could you explain in more detail (perhaps with standard references) why a "top level" assignment is different to another assignment? – M.M Mar 18 '15 at 14:24
@MattMcNabb in `1.0f + (y = e)`, the rule that assignments and casts remove the extra precision when `FLT_EVAL_METHOD > 0` only applies to `y = e`. The addition is still done with the extra precision implied by the value of `FLT_EVAL_METHOD`. This is all I mean. – Pascal Cuoq Mar 18 '15 at 14:52
OK. I was assuming that `tmp/2` is the main purpose of this loop and the `+1` made no difference, but in fact perhaps the `+` is also significant. – M.M Mar 18 '15 at 23:04

Double and float may have the same precision?

1 Answers1