Is it true that 0.1 + 0.2 - 0.3 gives a non-zero in IEEE 754, but gives a zero in traditional floating point?

Question

If I use Ruby, and show 0.1 + 0.2 - 0.3:

3.1.2 :001  > 0.1 + 0.2 - 0.3
 => 5.551115123125783e-17

And the digits can be shown this way:

3.1.2 :001 > "%0122.120f" % [0.1 + 0.2 - 0.3]
 => "0.000000000000000055511151231257827021181583404541015625000000000000000000000000000000000000000000000000000000000000000000"

The question is: it is using IEEE 754 and gives a non-zero:

3.1.2 :002 > 0.1 + 0.2 - 0.3 == 0
 => false

But what if it is strictly traditional floating point, where we store a significant and exponent, or what Donald Knuth called a fraction and exponent, then it would strictly give a 0?

Most of the quirks of IEEE-754 floating point can be attributed to the fact that it's trying to approximate a base 10 number using powers of two. If your "traditional" floating point numbers are stored as base 10, then yes, your argument holds true. Read [What every computer scientist should know about floating-point arithmetic](https://docs.oracle.com/cd/E19957-01/800-7895/800-7895.pdf) for more information. — Robert Harvey, Aug 14 '22 at 02:01
Re “strictly traditional floating point, where we store a significant and exponent”: IEEE-754 does store a significand and an exponent. — Eric Postpischil, Aug 14 '22 at 11:42

score 4 · Answer 1 · answered Aug 14 '22 at 19:00

To start with, there is no thing called "traditional floating point" unless you specify the exact tradition. Before IEEE754, there was ~60 different floating point implementations in hardware or popular libraries with fixed formats; you can see a nice survey, for example, here - and I'm not certain some implementations arenʼt lost. So, which one is "traditional"? ;) As @ChrisDodd pointed, IEEE754 can be now called "traditional" because it is in wide use since early 1980s and in nearly total use (except forced legacy) since 1990s. Well, a new programmer and user generation has been born, grown, studied at schools, universities and started working, who didn't use anything but IEEE754. (Disclaimer: I donʼt take into account the special formats used in machine learning, like 16-bit one which is 32-bit IEEE754 with cut-down mantissa width; and other non-general-computation applicatios.)

But, I would consider your question in a broader context. The concrete example 0.1+0.2-0.3 is not principal. We could consider 0.3+1.1-1.4, 0.01+0.06-0.07, or millions of other cases. I spent less than a minute to find these cases. On the other hand, 0.01+0.02-0.03 is zero using IEEE754 double. Why? Because this is how the stars have settled down. If you use another binary floating point implementation, it might behave opposite: 0.1+0.2-0.3==0, but 0.01+0.02-0.03!=0. Who knows?

The main thing is that, as other commenters already said, the issue is principally unavoidable with binary floating point. Any such implementation will have cases when limited mantissa length and rounding will spoil equalities which would be exact in decimal case. So, if your "traditional" is pre-computer era calculations using paper, abacus (any type, like wide versions), arithmometer - you are right, these implementations have no this kind of errors. If anything else - please specify what do you mean.

NB, well, IEEE754-2008 defines decimal implementations. Practically nobody uses them. I havenʼt seen an implementation in hardware except in IBM zSeries and pSeries. The problem is in domain. Typically, decimal fractions are used in financial accounting and taxation. One could use decimal floating point here, but really this domain nearly everywhere requires fixed point calculations. Using floating point there encounters risk of silent precision loss, which is unacceptable there. Well, IBM has its own peculiar of customer base, which isnʼt true for nearly all others.

So: start with determining your domain. If this is accounting/taxation, decimal fixed point is your best friend, and youʼll never face problems like 0.1+0.2!=0.3. Otherwise, most probably, youʼll be safe with IEEE754 binary, and the same issue wonʼt matter for you even if happens each time.

with all the info mentioned, I wonder why no one directly addressed `0.3` being stored as significant `3` and exponent `-1` and both 3 and -1 are stored as binary? It is said that "decimal" is not used but we are storing the 3 and -1 as binary so that isn't decimal. Is it true the decimal is about the 10 to the power -1 above? But what does it mean? We have to compose the 1 or 0.1 by 1/2 + 1/4 + 1/8, + 1/16, etc, etc? Then won't 0.3 or 0.123 be complicated to be converted? I wonder why since human all use 0.3 as decimal, why not just store 3 and exponent -1 and go with exponent base 2 — nonopolarity, Aug 16 '22 at 06:55
@nonopolarity "0.3 being stored as significant 3 and exponent -1 and both 3 and -1 are stored as binary" is exactly what is called decimal floating point (at least, variant with binary mantissa - because some implementations utilize Chen-Ho BCD mantissa encoding). I don't understand rest of your comment - do you mean to mix different exponent bases in one number? — Netch, Aug 16 '22 at 08:53
so you are saying IEEE 754 which is traditional floating point stores 0.1 as 0 x 2 to the power -1, plus 0 x 2 to the power -2, plus ... using all these fractions to try to make it add up to 0.1? If that's the case, it looks like it is using a lot of bits just to be able to make it 0.1, versus the so called "decimal floating point" just need to use 1 bit — nonopolarity, Aug 16 '22 at 09:36
@nonopolarity Binary (traditional) IEEE754 stores 0.1 (in double) as 3602879701896397 * 2^-55. Yep, it is 2^-4 + 2^-5 + 2^-8 + 2^-9 + 2^12 + ... Yes, you are right, many bits are set. This is the cost of representation of non-binary fractions. But it radically simplifies implementation and raises its accuracy. Decimal floating point requires multiplication and division by powers of 10, or fully BCD implementation, and they are much more costly (and, again, essentially useless). — Netch, Aug 16 '22 at 11:25
I guess I don't understand that, if we can calculate `1234567 * 9876543` fast, then how come we cannot calculate `0.1234567 * 0.9876543` fast, because we can do the integer calculation and then just multiply by `1/10000000` twice. Isn't it almost the same speed? And let's say if it is 32 times as slow, you know what computer scientists say, the constant factor doesn't matter that much, as we may double in speed in 18 months, so 32 times as slow just means 5 * 18 months, meaning 7.5 years and we are all ok... I would rather have accurate floating point than fast but inaccurate one... — nonopolarity, Aug 17 '22 at 11:12
@nonopolarity This is more complex than for a single comment. 1) Really we needn't divide by 1e7 twice. If values are represented as 1234567e0 and 9876543e0, after multiplication, a single division by 1e7 is enough to fit into 7 digit field (as in IEEE754 decimal32). 2) But, division by non-constants is slow. Decimal implementation in hardware should provide a set of precompiled inverse factors to implement it. 3) Doubling in speed each 18 months isn't a real factor at least since 2015. Plain frequency growth stopped in 2005. Moore's law about gate count isn't relayable to useful speed. — Netch, Aug 18 '22 at 06:21
@nonopolarity The growth we observe since Nehalem times is by smarter design, not extensive growth. 4) No, constant factors were always important in a competition. More so, they get crucial with computer power progress slowdown. 5) Maybe if all CS scientists invest into decimal floating, its quality implementations would appear earlier. But in real they appeared ~20 years later than binary counterparts. 6) Already said: decimal floating has too narrow niche. For accounting, fixed point is better in nearly all cases. 7) Binary floating is accurate. But it is binary:) — Netch, Aug 18 '22 at 06:41

score 1 · Answer 2 · answered Aug 14 '22 at 02:06

1

IEEE 754 is traditional floating point, and it is the fact that it uses base-2 that means it cannot exactly reprsent 0.3 and leads to this issue.

If you were to use decimal floating point, such as IEEE 754-2008, then it could eaxactly represent 0.3 and you would not get rounding errors in this case. You would still get rounding errors in other cases, however.

answered Aug 14 '22 at 02:06

Chris Dodd

119,907
13
134
226

so traditional floating point won't store the integer `1` as the significant, and an exponent of `-1`, and for `0.2`, store the integer `2` as the significant, etc, and then when we add or subtract, it is just doing integer arithmetics: `1 + 2 - 3` all with an exponent of `-1`, and therefore get the significant `0`, and exponent still `-1`, and therefore `0 x (10 to the power -1)` which is `0`? – nonopolarity Aug 14 '22 at 02:23
1

@nonopolarity how the values are stored is completely irrelevant. Some binary floating-point formats store the most significant 1, some don't. Some decimal floating formats normalize the significant digit to 0, some don't. As long as 0.1, 0.2 and 0.3 can be represented exactly in the target floating-point format then `0.1 + 0.2 - 0.3` will be exact, for example it'll be zero in base-100 or base-1000 floating-point formats. When doing operations obviously the radix point must be aligned by the ALU/FPU automatically and that's not something the user needs to care about – phuclv Aug 14 '22 at 03:20
isn't it relevant and is that how IEEE 754-2008 does it? I mean, if there is a floating point library that exactly stores 0.1 as integer (significant) as `1` and exponent `-1`, etc, then it does calculations: `1 + 2 - 3` all with exponent `-1` and now the answer is `0 x 10 to the power -1` and it is strictly `0` – nonopolarity Aug 14 '22 at 04:04
With traditional binary floating point (like IEEE-754), 0.1 will be stored as 0x1.9999 * 2^-4. Everything is with powers of 2, not powers of 10. – Chris Dodd Aug 14 '22 at 05:55
if that's the way it is, that's fine, but theoretically, the `1`, `2`, `3` can be, and are in fact stored as binary too... maybe that's how IEEE 754-2008 improved it... it just makes me wonder... why don't we have the significant part as an integer just like an int, and then an exponent to float the radix point? Why do we have to go to the fractional world to begin with and work with inaccuracy to begin with? – nonopolarity Aug 14 '22 at 07:01
3

@nonopolarity: "have the significant part as an integer just like an int, and then an exponent to float the radix point" That's exactly what IEEE 754 formats (both binary and decimal) do. E.g., the binary64 format allows you to express all integers up to 2**53 exactly, and then also every number that you get by taking one of those integers and scaling it by a power of two - i.e., shifting the radix point (with some limits on the total amount you can shift, of course). But no amount of scaling an integer by a power of _two_ will give you `0.1` exactly. – Mark Dickinson Aug 14 '22 at 08:34
2

@nonopolarity: Note that `0x1.999999999999ap-4` is the same thing (same value, same bit pattern stored internally) as `0x1999999999999ap-56` - i.e., it's the integer `7205759403792794` scaled by two to the power `-56`. Just two different ways of describing or thinking about the same thing. The important contrast here is between _binary_ floating-point formats and _decimal_ floating-point formats. And _both_ of those are described by (non-obsolete versions of) IEEE 754, so framing this as "IEEE 754 versus traditional" doesn't really make sense here. – Mark Dickinson Aug 14 '22 at 08:40

Is it true that 0.1 + 0.2 - 0.3 gives a non-zero in IEEE 754, but gives a zero in traditional floating point?

2 Answers2