IEEE floating points implementation, precision and accumulation of approximations

Question

If I understand IEEE floating points correctly, they are unable to accurately represent some values. They are accurate in very limited cases and pretty much every floating point operation increases the accumulated approximations. Also, another downside - the "minimum step" grows with the exponent.

Wouldn't it be better to offer some more concrete representation?

For example, use 20 bits for the "decimal" part, but not all all 2^20 values, instead only 1000000, giving a full 1/millionth smallest possible representation/resolution, and use the other 44 bits for the integer part, giving quite the range. This way "floating point" numbers can be calculated using integer arithmetic, which may even end up faster. And in the case of multiplication, addition and subtraction there is no accumulation of approximations, the only possible loss is during division.

This concept rests on the fact that 2^n values are not optimal for representing decimal numbers, e.g. 1 does not divide that well into 1024 parts, but it divides pretty well into 1000. Technically, this is omitting to make use of the full precision, but I can think of plenty of cases where LESS can be MORE.

Naturally, this approach will lose both range and precision in a way, but in all the cases where extremities are not required, such a representation sounds like a good idea.

Your system is unable to accurately represent some values, and is accurate in very limited cases. Besides, multiplication will *also* loose accuracy. — Jongware, Sep 10 '13 at 14:13
IEEE is a standardization institute, you mean the IEEE 754 standard. And there already is a standard for decimal floating-point: http://en.wikipedia.org/wiki/Decimal_floating_point — Pascal Cuoq, Sep 10 '13 at 14:15
possible duplicate of [integers or floating point in situations when either would do?](http://stackoverflow.com/questions/18434108/integers-or-floating-point-in-situations-when-either-would-do) — Eric Postpischil, Sep 10 '13 at 17:37
This question indicates there is no error in fixed-point multiplication. That is incorrect. Multiplying numbers with resolution 1/10**6 produces results with resolution 1/10**12. The results must be rounded to fit in a fixed 1/10**6 format. — Eric Postpischil, Sep 10 '13 at 17:42
This question assumes decimal is better than binary, for unknown reasons. Decimal is no more suited to mathematics or the natural world than binary is. Its benefit is because humans use decimal, so computations performed by computers using decimal tend to be less surprising to humans than computations using binary. This does not make the results more accurate. It does tend to mask errors, since it fails to reveal to humans their implicit assumptions about arithmetic. (If the computer makes the same mistake you do, you do not see it.) — Eric Postpischil, Sep 10 '13 at 17:44
@EricPostpischil - I think you are overstating, I never stated that decimal is better than binary. It is more like that full precision binary is not best suited for representing decimal numbers, and not the best solution in some cases. — , Sep 10 '13 at 19:00
@user2341104: If decimal is not better than binary, then why limit the fraction part to 1,000,000 instead of the full 1,048,576? What benefits is gained? — Eric Postpischil, Sep 10 '13 at 19:18
@EricPostpischil - the benefit gained is that 1/1000000 is a nice round one millionth compared to an awkward 0.95...something millionth. Take the number 3.14 - it can be PERFECTLY represented using a step of one millionth (even only one hundredth), while using the full precision you can represent 3.1399..something at best... catch my drift? — , Sep 10 '13 at 20:44
@user2341104: There is nothing special about 1/1000000 or 3.14 outside of their appearance in decimal. Hence my comment about assuming that decimal is better than binary. Mathematically, these numbers are not round or particularly nice. There is no mathematical benefit to using decimal over binary. You have chosen these numbers only because they look nice in the numeral system you use, not because decimal is more accurate or faster or otherwise advantageous computationally. — Eric Postpischil, Sep 10 '13 at 21:09
@EricPostpischil - let me put it this way - using a step of one millionth allows to fully represent the number 0.1 and 0.2 and add them to a nice 0.3. You cannot do that with a step of `1 / 2^20`. I am not saying decimal is better than binary, I am saying a decimal step can be much better at representing concrete decimals numbers, which are what we mostly use in our world. The drawback of lower precision can be completely displaced by the benefits of accurate representation and lack of creeping approximations in application with specific range and step requirements. How is this hard to get? — , Sep 10 '13 at 22:43
@user2341104: There is nothing inherently decimal about most of the numbers we use. Consider calculations in your phone: device orientation from accelerometer data, encoding video data, processing a signal from the cell phone tower. None of it is inherently decimal. Even numbers that originate in decimal representations from humans quickly become non-decimal, such as calculating a monthly interest rate from an annual rate. The notion that there is some preference for decimal is a human foible, not an inherent feature of the world. — Eric Postpischil, Sep 11 '13 at 15:51
@EricPostpischil - forgetting currency? Prices? Those are decimal in nature, otherwise it will be immensely consumer-unfriendly. What about weight of products? What about dimensions? How is some binary or completely arbitrary floating point approximation than let's say height is 1.82 meters? You ignore the weight of the human factor, which is why most programming is done in the first place. I agree floats are perfect for high precision scientific stuff, but they are inconvenient at representing "everyday practical values" Humans are not numeral system agnostic, at least not 99.9999999% of them — , Sep 12 '13 at 00:53
Please implement your proposed arithmetic system and report back to us on how it performs, accuracy problems you encounter, and the reception it receives from programmers. — Eric Postpischil, Sep 12 '13 at 04:25

score 3 · Answer 1 · answered Sep 10 '13 at 14:20

3

What you describe as a proposition is a fixed point arithmetic. Now, it's not necesserily about better or worse; each representation has advantages and disadvantages that often make one more suitable than the other for some specific purpose. For example:

Fixed point arithmetic does not introduce rouding errors for operations like addition and subtraction, what makes it suitable for financial calculations. You certainly don't want to store money as a floating point values.
Speculation: arguably, fixed point arithmetic is simpler in terms of implementation, which probably leads to smaller, more efficient circuits.
Floating-point representation covers extremely large range: it can be used to store really big numbers (~10⁴⁰ for 32-bit float, 10³⁰⁸ for 64-bit one) and really small positive ones (~10^-320) at the expense of precision, while the fixed-point representation is linearly limited by its size.
Floating-point precision is not distributed uniformly accross the representable range. Instead, most of the values (in terms of number of representable numbers) lies in the unit ball around 0. That makes it very accurate in the range we operate in most often.

You said it yourself:

Technically, this is omitting to make use of the full precision, but I can think of plenty of cases where LESS can be MORE

Exactly, that's the whole point. Now, depending on the problem at hand, a choice must be made. There is no one-size-fits-all representation, it's always a tradeoff.

answered Sep 10 '13 at 14:20

Marcin Łoś

3,226
1
19
21

I was under the impression IEEE floating point representation was supposed to be "one size fits all" considering every programming language I've seen provides only IEEE real numbers. Granted, the scenario I describe is very easy to implement, but still... – Sep 10 '13 at 14:25
1

As fixed-point system go, the OP's is unconventional and wasteful. It is more efficient to represent millionths in a 64-bit integer than to split the 64 bits into 20 and 44 and waste some of the values that can be represented in the 20 bits, not to mention the complexity of any operation in that system. – Pascal Cuoq Sep 10 '13 at 14:28
@PascalCuoq - it was just an example, obviously, you can create your own implementation based on the range and precision requirements. You will still have to keep it to 8, 16, 32 or 64 bit however, because otherwise the overhead on the hardware will be significant if using arbitrary bit width types. – Sep 10 '13 at 14:31
@Pascal Couq Yeah, sure, I was referring to general concept instead of concrete realization. – Marcin Łoś Sep 10 '13 at 14:57
1

@user2341104 IEEE 754 binary floating point is not, and never was meant to be, "one size fits all". Rather, it is a matter of a few sizes that fit many, but not all, situations. There are situations in which fixed point or decimal floating point are better. – Patricia Shanahan Sep 10 '13 at 16:07
@PatriciaShanahan - and yet, none of the widely used programming languages offers anything else, as far as I am aware. Correct me if I am wrong. – Sep 10 '13 at 16:24
@user2341104 Fixed point is inherent in integer arithmetic, which is offered by most programming languages. Java also supplies a scaled decimal format, java.math.BigDecimal. – Patricia Shanahan Sep 10 '13 at 16:35
@PatriciaShanahan: If the largest integer type is e.g. 64 bits, efficient fixed-point math requires primitives for (X*Y)>>64, (X<<64)/Y, and (X<<64)%Y. Many processors have such primitives available, but programming languages seldom expose them in any usable form. – supercat Sep 11 '13 at 20:36

IEEE floating points implementation, precision and accumulation of approximations

1 Answers1