is it possible to categorize the different forms of the approximation of floating point numbers

Question

I am just wondering if we can make rules for the form of the approximation of real numbers using floating point numbers.

For intance is a floating point number can be terminated by 1.xxx777777 (so terminated by infinite 7 by instance and eventually a random digit at the end ) ?

I believe that there is only this form of floating point number :

1. exact value.

2. value like 1.23900008721.... so where 1.239 is approximated with digits that appears as "noise" but with 0 between the exact value and this noise

3. value like 3.2599995, where 3.26 is approximated by adding 9999.. and a final digit (like 5), so approximated with a floating number just below the real number

4. value like 2.000001, where 2.0 is approximated with a floating number just above the real number

In what "category" of yours would you put the number 1.21212121212121212121212121... (or any repeating sequence like that)? Or Pi? (And what's the difference between 2) and 4)?) — Mat, Dec 05 '12 at 20:34
You are trying to reason in decimal about binary floating-point numbers. This will not lead you anywhere. — Pascal Cuoq, Dec 05 '12 at 20:34
@Mat: I didn't know this kind of floating exists, in a 5. category so — Guillaume Paris, Dec 05 '12 at 20:37
@Pascal: you probably right, so the answer to my question could be yes ? a floating point number can be terminated by .77779 /.7777 ? — Guillaume Paris, Dec 05 '12 at 20:38
They don't "exist" as floating point numbers, but they are real numbers that don't fit in your categories. What's the point of these categories anyway? — Mat, Dec 05 '12 at 20:38
hum sorry yes, I get you, irrational real number are approximated by "cut" the end so if digit10 return 7 we get 1.212120 I guess, so in a five category actually — Guillaume Paris, Dec 05 '12 at 20:43
Floating point numbers are by their construction rational, so no irrational numbers can be precisely represented in floating point (but that's regardless of the base - binary or decimal). In addition to that, the vast majority of rational numbers cannot be represented as floating point numbers (finite precision). However the set of unrepresentable rationals in decimal is not equivalent to the unrepresentable rationals in binary. — twalberg, Dec 05 '12 at 20:53

Pascal Cuoq · Answer 1 · 2012-12-05T22:39:15.527

You are thinking in terms of decimal numbers, that is, numbers that can be represented as n*(10^e), with e either positive or negative. These numbers occur naturally in your thought processes for historical reasons having to do with having ten fingers.

Computer numbers are represented in binary, for technical reasons that have to do with an electrical signal being either present or absent.

When you are dealing with smallish integer numbers, it does not matter much that the computer representation does not match your own, because you are thinking of an accurate approximation of the mathematical number, and so is the computer, so by transitivity, you and the computer are thinking about the same thing.

With either very large or very small numbers, you will tend to think in terms of powers of ten, and the computer will definitely think in terms of powers of two. In these cases you can observe a difference between your intuition and what the computer does, and also, your classification is nonsense. Binary floating-point numbers are neither more dense or less dense near numbers that happen to have a compact representation as decimal numbers. They are simply represented in binary, n*(2^p), with p either positive or negative. Many real numbers have only an approximative representation in decimal, and many real numbers have only an approximative representation in binary. These numbers are not the same (binary numbers can be represented in decimal, but not always compactly. Some decimal numbers cannot be represented exactly in binary at all, for instance 0.1).

If you want to understand the computer's floating-point numbers, you must stop thinking in decimal. 1.23900008721.... is not special, and neither is 1.239. 3.2599995 is not special, and neither is 3.26. You think they are special because they are either exactly or close to compact decimal numbers. But that does not make any difference in binary floating-point.

Here are a few pieces of information that may amuse you, since you tagged your question C++:

If you print a double-precision number with the format %.16e, you get a decimal number that converts back to the original double. But it does not always represent the exact value of the original double. To see the exact value of the double in decimal, you must use %.53e. If you write 0.1 in a program, the compiler interprets this as meaning 1.000000000000000055511151231257827021181583404541015625e-01, which is a relatively compact number in binary. Your question speaks of 3.2599995 and 2.000001 as if these were floating-point numbers, but they aren't. If you write these numbers in a program, the compiler will interpret them as 3.25999950000000016103740563266910612583160400390625 and 2.00000100000000013977796697872690856456756591796875. So the pattern you are looking for is simple: the decimal representation of a floating-point number is always 17 significant digits followed by 53-17=36 “noise” digits as you call them. The noise digits are sometimes all zeroes, and the significant digits can end in a bunch of zeroes too.

you totaly right but if we limit us to the set of real numbers which not exceeded the digit10 value in c++ with either float or double, we only get 2 king of errors, irrational real numbers which have to be ended, and real numbers (like 0.1) which have to be approximated because they don't have a representation with the mathematic model of floating point system. According to that I thought that we can categorize the king of form that approximation always take, but apparently I am wrong — Guillaume Paris, Dec 05 '12 at 21:12

Konstantin Dinev · Answer 2 · 2012-12-05T20:39:16.627

0

Floating point is presented by bits. What this means is:

1 bit flipped after the decimal is 0.5 or 1/2
01 bits is 0.25 or 1/4
etc.

This means floating point is always approximately close but not exact if it's not an exact power of 2, when represented in terms of what the machine can handle.

Rational numbers can very accurately be represented by the machine (not precisely of course if not a power of two below the decimal point), but irrational numbers will always carry an error. In terms of this your question is not so much related to c++ as to computer architecture.

edited Dec 05 '12 at 20:39

answered Dec 05 '12 at 20:34

Konstantin Dinev

34,219
14
75
100

a rational number like 0.1 cannot be accurately represented by the machine – Guillaume Paris Dec 05 '12 at 20:39
@Guillaume07 very accurately does not mean precisely accurate. I just meant less error – Konstantin Dinev Dec 05 '12 at 20:40
Hum it's just a different king of errors but are you sure that this king of error lead to less error than encoding a irrational number ? In other term for you encoding 0.1 versus 1/3, 1/3 contains more loss than 0.1 ? – Guillaume Paris Dec 05 '12 at 20:47
@Guillaume07, 0.1 is the same as 1/10. Neither 1/10 nor 1/3 can be represented exactly in binary, so they will both have an error. Without calculating the actual error it's impossible to predict which will be greater. – Mark Ransom Dec 05 '12 at 20:57

is it possible to categorize the different forms of the approximation of floating point numbers

2 Answers2