Floating-point comparison of constant assignment

Question

When comparing doubles for equality, we need to give a tolerance level, because floating-point computation might introduce errors. For example:

double x; 
double y; 
x = f();
y = g();

if (fabs(x-y)<epsilon) {
   // they are equal!
} else {
   // they are not!
}

However, if I simply assign a constant value, without any computation, do I still need to check the epsilon?

double x = 1;
double y = 1;

if (x==y) {
   // they are equal!
} else {
   // no they are not!
}

Is == comparison good enough? Or I need to do fabs(x-y)<epsilon again? Is it possible to introduce error in assigning? Am I too paranoid?

How about casting (double x = static_cast<double>(100))? Is that gonna introduce floating-point error as well?

I am using C++ on Linux, but if it differs by language, I would like to understand that as well.

Whether you need an epsilon depends on the situation. E.g. when you need a transitive equality (`a==b && b==c` implies `a==c`), then you may not use an epsilon. BTW, `double x = 1` already means `double x = static_cast(1)` — MSalters, Mar 23 '12 at 08:16

paxdiablo · Accepted Answer · 2012-03-23T05:47:40.800

Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:

If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match ^(a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.

^(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.

If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.

For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.

Some implementations seem to evaluate `float` literals by finding the `double` value nearest to a `float`, and then rounding that to a `float`. This may sometimes cause `float` literals whose actual value is slightly above or below a `double` that is exactly between two adjacent `float` values to be assigned a value which is not the closest. — supercat, Jun 18 '14 at 18:14

score 0 · Answer 2 · answered Mar 23 '12 at 04:49

0

No if you assign literals they should be the same :)

Also if you start with the same value and do the same operations, they should be the same.

Floating point values are non-exact, but the operations should produce consistent results :)

answered Mar 23 '12 at 04:49

joshuahealy

3,529
22
29

score 0 · Answer 3 · answered Mar 23 '12 at 05:57

Both cases are ultimately subject to implementation defined representations.

Storage of floating point values and their representations take on may forms - load by address or constant? optimized out by fast math? what is the register width? is it stored in an SSE register? Many variations exist.

If you need precise behavior and portability, do not rely on this implementation defined behavior.

score 0 · Answer 4 · answered Mar 23 '12 at 06:05

IEEE-754, which is a standard common implementations of floating point numbers abide to, requires floating-point operations to produce a result that is the nearest representable value to an infinitely-precise result. Thus the only imprecision that you will face is rounding after each operation you perform, as well as propagation of rounding errors from the operations performed earlier in the chain. Floats are not per se inexact. And by the way, epsilon can and should be computed, you can consult any numerics book on that.

Floating point numbers can represent integers precisely up to the length of their mantissa. So for example if you cast from an int to a double, it will always be exact, but for casting into into a float, it will no longer be exact for very large integers.

There is one major example of extensive usage of floating point numbers as a substitute for integers, it's the LUA scripting language, which has no integer built-in type, and floating-point numbers are used extensively for logic and flow control etc. The performance and storage penalty from using floating-point numbers turns out to be smaller than the penalty of resolving multiple types at run time and makes the implementation lighter. LUA has been extensively used not only on PC, but also on game consoles.

Now, many compilers have an optional switch that disables IEEE-754 compatibility. Then compromises are made. Denormalized numbers (very very small numbers where the exponent has reached smallest possible value) are often treated as zero, and approximations in implementation of power, logarithm, sqrt, and 1/(x^2) can be made, but addition/subtraction, comparison and multiplication should retain their properties for numbers which can be exactly represented.

score 0 · Answer 5 · answered Mar 23 '12 at 17:21

The easy answer: For constants == is ok. There are two exceptions which you should be aware of:

First exception:

0.0 == -0.0

There is a negative zero which compares equal for the IEEE 754 standard. This means 1/INFINITY == 1/-INFINITY which breaks f(x) == f(y) => x == y

Second exception:

NaN != NaN

This is a special caveat of NotaNumber which allows to find out if a number is a NaN on systems which do not have a test function available (Yes, that happens).

Floating-point comparison of constant assignment

5 Answers5