Floating point math in different programming languages

Question

I know that floating point math can be ugly at best but I am wondering if somebody can explain the following quirk. In most of the programing languages I tested the addition of 0.4 to 0.2 gave a slight error, where as 0.4 + 0.1 + 0.1 gave non.

What is the reason for the inequality of both calculation and what measures can one undertake in the respective programing languages to obtain correct results.

In python2/3

.4 + .2
0.6000000000000001
.4 + .1 + .1
0.6

The same happens in Julia 0.3

julia> .4 + .2
0.6000000000000001

julia> .4 + .1 + .1
0.6

and Scala:

scala> 0.4 + 0.2
res0: Double = 0.6000000000000001

scala> 0.4 + 0.1 + 0.1
res1: Double = 0.6

and Haskell:

Prelude> 0.4 + 0.2
0.6000000000000001    
Prelude> 0.4 + 0.1 + 0.1
0.6

but R v3 gets it right:

> .4 + .2
[1] 0.6
> .4 + .1 + .1
[1] 0.6

Some languages hide the truth by rounding for display purposes. — DavidO, Feb 19 '14 at 06:37
Actually, R is just hiding it from you: run `format(.4 + .1 + .1, digits=17)`, `format(.4 + .2, digits=17)`. — tonytonov, Feb 19 '14 at 06:38
The very word "equality" is inappropriate for floating-point numbers. As a general rule you should _never_ expect it to hold for any two given such numbers, consider any exceptions (precise integers or negative powers of two) merely flukes. As long as you only _compare_ numbers with `>`, or e.g. plot something (which is very often completely sufficient), floats work great. — leftaroundabout, Feb 19 '14 at 11:44
At best floating-point math is far from ugly; it has proven beautiful enough to land a man on the moon, to model the human heart in action, and to peer into the furthest depths of the universe. Any ugliness is in the eye of the (myopic, astigmatic) beholder. — High Performance Mark, Feb 19 '14 at 14:29
@Mark Floating point is wonderful; it just doesn't have exact equality defined. Languages which give that to the programmer are committing small lies. Turns out that many real-world situations don't have exact equality either. — J. Abrahamson, Feb 19 '14 at 15:05
@leftaroundabout This is superstition. Floating-point equality works fine. It is most of the other operations that aren't exact in all circumstances, but if you know that you have used these operations in conditions where they are exact, for instance, you can very well use equality as part of a floating-point algorithm. The function here to convert a float to int without using a conversion relies on floating-point equality: http://blog.frama-c.com/index.php?post/2013/05/01/A-conversionless-conversion-function2 — Pascal Cuoq, Feb 19 '14 at 17:58
@J.Abrahamson Equality is one of the few floating-point operators that behaves the most like the corresponding math operation. Equality is not the problem, all the other operations are. Do not think that you will automatically write floating-point programs that work by simply avoiding equality: you won't, because the floating-point equality operator is not the problem. — Pascal Cuoq, Feb 19 '14 at 18:01
@PascalCuoq: equality by itself already behaves not like the corresponding maths operation (`NaN == NaN` is false), but that's not the real issue. What I mean has nothing to do with the particular equality operation, but with even thinking about equalness when dealing with floats. In all good uses I'm aware of, floats approximate _real numbers_. The traditional maths background to those relies a lot on nonconstructive proofs and is therefore not suitable for computing purposes (not that scientists or engineers cared). Alternatives such as [ASD](http://www.paultaylor.eu/ASD/) have no equality. — leftaroundabout, Feb 19 '14 at 21:46
@PascalCuoq leftroundabout covered essentially what I wanted to say. Under the general notions that floating point is trying to model, you might as well throw out equality. It truly does have a well-defined notion of equality, but if you're using it in, say, scientific code then you've shot yourself in the foot. — J. Abrahamson, Feb 19 '14 at 22:24

Mark Dickinson · Accepted Answer · 2014-02-19T16:17:45.833

All these languages are using the system-provided floating-point format, which represents values in binary rather than in decimal. Values like 0.2 and 0.4 can't be represented exactly in that format, so instead the closest representable value is stored, resulting in a small error. For example, the numeric literal 0.2 results in a floating-point number whose exact value is 0.200000000000000011102230246251565404236316680908203125. Similarly, any given arithmetic operation on floating-point numbers may result in a value that's not exactly representable, so the true mathematical result is replaced with the closest representable value. These are the fundamental reasons for the errors you're seeing.

However, this doesn't explain the differences between languages: in all of your examples, the exact same computations are being made and the exact same results are being arrived at. The difference then lies in the way that the various languages choose to display the results.

Strictly speaking, none of the answers you show is correct. Making the (fairly safe) assumption of IEEE 754 binary 64 arithmetic with a round-to-nearest rounding mode, the exact value of the first sum is:

0.600000000000000088817841970012523233890533447265625

while the exact value of the second sum is:

0.59999999999999997779553950749686919152736663818359375

However, neither of those outputs is particularly user-friendly, and clearly all of the languages you tested made the sensible decision to abbreviate the output when printing. However, they don't all adopt the same strategy for formatting the output, which is why you're seeing differences.

There are many possible strategies for formatting, but three particularly common ones are:

Compute and display 17 correctly-rounded significant digits, possibly stripping trailing zeros where they appear. The output of 17 digits guarantees that distinct binary64 floats will have distinct representations, so that a floating-point value can be unambiguously recovered from its representation; 17 is the smallest integer with this property. This is the strategy that Python 2.6 uses, for example.
Compute and display the shortest decimal string that rounds back to the given binary64 value under the usual round-ties-to-even rounding mode. This is rather more complicated to implement than strategy 1, but preserves the property that distinct floats have distinct representations, and tends to make for pleasanter output. This appears to be the strategy that all of the languages you tested (besides R) are using.
Compute and display 15 (or fewer) correctly-rounded significant digits. This has the effect of hiding the errors involved in the decimal-to-binary conversions, giving the illusion of exact decimal arithmetic. It has the drawback that distinct floats can have the same representation. This appears to be what R is doing. (Thanks to @hadley for pointing out in the comments that there's an R setting which controls the number of digits used for display; the default is to use 7 significant digits.)

@hadley: Thanks. I was trying to find that information in the documentation; do you have a doc link handy? — Mark Dickinson, Feb 19 '14 at 14:17
In R, ?options, under 'digits'. Online at http://stat.ethz.ch/R-manual/R-patched/library/base/html/options.html — Gray, Feb 19 '14 at 15:25
Excellent explanation. Printing binary floating-point values in the least number of decimal digits required to reproduce the same value on input is a surprisingly difficult problem. An efficient algorithm that doesn't need arbitrary precision arithmetic was only [published in 2010 by Florian Loitsch](http://florian.loitsch.com/publications/dtoa-pldi2010.pdf). Julia uses the excellent [double-conversion library](https://code.google.com/p/double-conversion/) which Florian developed for the V8 JavaScript engine. — StefanKarpinski, Feb 20 '14 at 04:28
@StefanKarpinski It still needs arbitrary precision for some cases (from the referenced paper: "... roughly 99.5% are processed correctly and are thus guaranteed to be optimal (with respect to shortness and rounding). The remaining 0.5% are rejected and need to be printed by another printing algorithm (like Dragon4)."). — Rick Regan, Feb 20 '14 at 17:09
Yes, that's true. Or you can give up very slightly on perfectly optimal printing and do without it. — StefanKarpinski, Feb 20 '14 at 20:18
"However, this doesn't explain the differences between languages": true and false ;-) Another reason besides rounding for display is the order of calculations done behind the scenes. For this particular question there's no arguing. But in the more general case of arbitrary expressions there's a difference in how languages internally optimize or rearrange computation. Is this worthwhile to add to this answer? — cfi, Sep 05 '15 at 13:10

comingstorm · Answer 2 · 2014-02-19T08:50:51.513

You should be aware that 0.6 cannot be exactly represented in IEEE floating point, and neither can 0.4, 0.2, and 0.1. This is because the ratio 1/5 is an infinitely repeating fraction in binary, just like ratios such as 1/3 and 1/7 are in decimal. Since none of your initial constants is exact, it is not surprising that your results are not exact, either. (Note: if you want to get a better handle on this lack of exactness, try subtracting the value you expect from your computed results...)

There are a number of other potential gotchas in the same vein. For instance, floating point arithmetic is only approximately associative: adding the same set of numbers together in different orders will usually give you slightly different results (and occasionally can give you very different results). So, in cases where precision is important, you should be careful about how you accumulate floating point values.

The usual advice for this situation is to read "What Every Computer Scientist Should Know About Floating Point Arithmetic", by David Goldberg. The gist: floating point is not exact, and naive assumptions about its behavior may not be supported.

Nowayz · Answer 3 · 2015-01-08T04:00:03.323

4

The reason is because it's being rounded up at the end according to the IEEE Standard for Floating-Point Arithmetic :

http://en.wikipedia.org/wiki/IEEE_754

According to the standard: addition, multiplication, and division should be completely correct all the way up to the last bit. This is because a computer has a finite amount of space to represent these values and cannot infinitely trail the precision.

edited Jan 08 '15 at 04:00

answered Feb 19 '14 at 06:42

Nowayz

1,882
4
21
34

2

"cannot infinitely trail the zeros" - well, that's easy enough. An infinite number of 0s takes 0 space to store with an efficient encoding, since it contains 0 information. The problem is storing an infinite trail of mixed 0s and 1s. – user2357112 Feb 19 '14 at 06:59
2

Seriously, the phrase “cannot infinitely trail the zeros” does not make any sense. All numbers in IEEE 754 format have infinite trailing zeroes in decimal **and** in binary, so it is clearly possible to represent numbers with this property. – Pascal Cuoq Feb 19 '14 at 08:53
@PascalCuoq There you go, I fixed the wording for you – Nowayz Jan 08 '15 at 04:01

Floating point math in different programming languages

3 Answers3

Linked

Related