Machine epsilon calculation is different using C11 and GNU11 compiler flags

Question

When using Python & Julia, I can use a neat trick to investigate machine epsilon for a particular floating point representation.

For example, in Julia 1.1.1:

julia> 7.0/3 - 4/3 - 1
2.220446049250313e-16 

julia> 7.0f0/3f0 - 4f0/3f0 - 1f0
-1.1920929f-7

I'm currently learning C and wrote this program to try and achieve the same thing:

#include <stdio.h>

int main(void)
{
  float foo;
  double bar;

  foo = 7.0f/3.0f - 4.0f/3.0f - 1.0f;
  bar = 7.0/3.0 - 4.0/3.0 - 1.0;

  printf("\nM.E. for float: %e \n\n", foo);
  printf("M.E. for double: %e \n\n", bar);

  return 0;
}

Curiously, the answer I get depends on whether I use C11 or GNU11 compiler standard. My compiler is GCC 5.3.0, running on Windows 7 and installed via MinGW.

So in short, when I compile with: gcc -std=gnu11 -pedantic begin.c I get:

M.E. for float: -1.192093e-007

M.E. for double: 2.220446e-016

as I expect, and matches Python and Julia. But when I compile with: gcc -std=c11 -pedantic begin.c I get:

M.E. for float: -1.084202e-019

M.E. for double: -1.084202e-019

which is unexpected. I thought it might by GNU specific features which is why I added the -pedantic flag. I have been searching on google and found this: https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html but I still am unable to explain the difference in behaviour.

To be explicit, my question is: Why is the result different using the different standards?

Update: The same differences apply with C99 and GNU99 standards.

So what exactly are you asking? I don't see a question here (a sentence with a question mark at the end). — S.S. Anne, Aug 18 '19 at 18:24
Can't replicate. This really, really, really, really loks like you added `-O1` to the second compilation. The `\M` is invalid. — KamilCuk, Aug 18 '19 at 18:25
The neatest trick for getting the floating-point epsilon in C is to insert `#include ` and then use `FLT_EPSILON` or `DBL_EPSILON`. — Eric Postpischil, Aug 18 '19 at 18:29
Then post assembly code of your program from both cases. Your compiler produces (should produce?) [exactly the same](https://godbolt.org/z/nVUh1k) assembly code in both cases. Something else changed, you made an error, are not telling the whole story, etc. — KamilCuk, Aug 18 '19 at 18:29
Kamil Cuk -- I was your curious about your comment. I tried both standards with all flags O1, O2, O3 and I get the exact same results. — Moustache, Aug 18 '19 at 18:29
GCC 5.3 on MinGW with Windows 7 is super old. Use the latest GCC 8 from MinGW-w64. — S.S. Anne, Aug 18 '19 at 18:32
The C standard does not require implementations to evaluate floating-point expressions in their nominal type. They are allowed to use more precision. Aside from the problems you may be having reproducing the issue controllably, you would need to control the precision by changing the expression to `(double) ((double) 7./3. - (double) 4./3.) - 1`. In C, casts and assignments are required to “discard” excess precision. — Eric Postpischil, Aug 18 '19 at 18:32
Additionally, it is not a neat trick. At first glance, it looks like it relies on the floating-point representation using a binary base and the specific patterns produced by the fraction ⅓. It would be bad practice to use that without documenting how it behaves, why, and the requirements for it (preferably with a static assert that `FLT_RADIX == 2`). Once you are done putting all that into the source code, it is still a trick, but it is not neat. Just use `FLT_EPSILON` or `DBL_EPSILON`. — Eric Postpischil, Aug 18 '19 at 18:33
Kamil Cuk, here is the assembly as requested in case it helps diagnose the situation: https://gist.github.com/moustachio-belvedere/b239452a26b21b3da5dbf64500d7e59f and https://gist.github.com/moustachio-belvedere/d7d1b1dafc813031de9407d1cc501547 — Moustache, Aug 18 '19 at 18:35
Kamil Cuk, apologies the \m was a typo when typing the question. The code is \nM, I have updated the question to reflect the true code. — Moustache, Aug 18 '19 at 18:36
Eric Postipischil, thank you I updated the code according to your explanation in the first comment and it gives the expected result. If you write it up as an answer I will accept it as the solution. It is exactly the explanation I was looking for. — Moustache, Aug 18 '19 at 18:45
For the other sceptical commenters, this is the final code which gives me expected results under both standards: https://gist.github.com/moustachio-belvedere/d9aeca8037a007c9ed6de1aaa804475b — Moustache, Aug 18 '19 at 18:47
I think this is quite answerable question. There has to be a reason for the difference. @Moustache you should link the assemblies *prominently* into the question - and if you can reproduce this exact output with Godbolt, then link that instead! — Antti Haapala -- Слава Україні, Aug 18 '19 at 18:52
Note my expression with casts above needs parentheses around the divisions. The version in my answer corrects this. — Eric Postpischil, Aug 18 '19 at 20:14

score 4 · Accepted Answer · answered Aug 18 '19 at 19:41

4

In C, the best way to get the float or double epsilon is to include <float.h> and use FLT_MIN or DBL_MIN.

The value of 7.0/3.0 - 4.0/3.0 - 1.0; is not fully specified by the C standard because it allows implementations to evaluate floating-point expressions with more precision than the nominal type. To some extent, this can be dealt with by using casts or assignments. The C standard requires casts or assignments to “discard” excess precision. This is not a proper solution in general, because there can be rounding both with the initial excess precision and with the operation that “discards” excess precision. This double-rounding may produce a different result than calculating entirely with the nominal precision.

Using the cast workaround with the code in the question yields:

_Static_assert(FLT_RADIX == 2, "Floating-point radix must be two.");
float FloatEpsilon = (float) ((float) (7.f/3) - (float) (4.f/3)) - 1;
double DoubleEpsilon = (double) ((double) (7./3) - (double) (4./3)) - 1;

Note that a static assertion is required to ensure that the floating-point radix is as expected for this kludge to operate. The code should also include documentation explaining this bad idea:

The binary representation for the fraction ⅓ ends in an infinite sequences of “01010101…”.
When the binary for 4/3 or 7/3 is rounded to a fixed precision, it is as if the numeral were truncated and rounded down or up, depending on whether the next binary digit after truncation were a 0 or a 1.
Given our assumption that floating-point uses a base-two radix, 4/3 and 7/3 are in consecutive binades (4/3 is in [1, 2), and 7/3 is in [2, 4). Therefore, their truncation points are one position apart.
Thus, we converting to a binary floating-point format, 4/3 and 7/3 differ in that the latter exceeds the former by 1 and its significand ends one bit sooner. Examination of the possible truncation points reveals that, aside from the initial difference of 1, the significands differ by the value of the position of the low bit in 4/3, although the difference may be in either direction.
By Sterbenz’ Lemma, there is no floating-point error in subtracting 4/3 from 7/3, so the result is exactly 1 plus the difference described above.
Subtracting 1 produces that difference, which is the value of the position of the low bit of 4/3 except that it may be positive or negative.

answered Aug 18 '19 at 19:41

Eric Postpischil

195,579
13
168
312

"the C standard... allows implementations to evaluate floating-point expressions with more precision than the nominal type" -- Is there a specific technical term which describes this behaviour? I would like to try and read more into the GNU standard as it doesn't appear to do that based on my small investigations today. – Moustache Aug 18 '19 at 20:08
1

@Moustache: There is not a defined term for that. It arises from C 2018 5.2.4.2.2 10, which says “Except for assignment and cast (which remove all extra range and precision), the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of `FLT_EVAL_METHOD`:…” – Eric Postpischil Aug 18 '19 at 20:12
3

@Moustache: Essentially, this license was given to implementations to allow them to use the native hardware for calculations without requiring them to emulate a narrower type. For example, a machine might have floating-point instructions and registers for some `double` type along with instructions for converting to `float` but no instructions for calculating directly in `float`. This license from the C standard allows those implementations to conform to the standard without making them emulate perfect `float` instructions. – Eric Postpischil Aug 18 '19 at 20:14
4

Makes perfect sense now that you have explained. Indeed, I just found this online: *"The default when in a standards compliant mode (-std=c11 or similar) is -fpermitted-flt-eval-methods=c11. The default when in a GNU dialect (-std=gnu11 or similar) is -fpermitted-flt-eval-methods=ts-18661-3"* from https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html – Moustache Aug 18 '19 at 20:26
1

@Moustache: As I read it, that tells us the **reporting** of the floating-point evaluation method changes based on the dialect. It does not say the method itself changes. You might insert `printf("%d\n", FLT_EVAL_METHOD);` and run the program with gnu11 and c11 to see what it reports. – Eric Postpischil Aug 18 '19 at 22:02
@EricPostpischil: Given that C is allowed to use a type with higher range and precision; can C pretend it used infinite precision and therefore replace the expression `7.0/3.0 - 4.0/3.0 - 1.0` with the exact constant `0.0`? – Brendan Aug 18 '19 at 23:14
@Brendan: Yes, that is allowed. – Eric Postpischil Aug 19 '19 at 00:37
Interesting @EricPostpischil -- evaluating https://gist.github.com/moustachio-belvedere/b104ba1a63f746e031935a7c6637f4f5 I get FLT_EVAL_METHOD==2 for both `gnu11` and `c11`. Perhaps not what I expected – Moustache Aug 19 '19 at 21:20
1

@Moustache: Yes, 2 means everything should be evaluated using `long double`, so it being the same in both would not explain a difference. – Eric Postpischil Aug 21 '19 at 00:04

Machine epsilon calculation is different using C11 and GNU11 compiler flags

1 Answers1