What is the minimum number of significant decimal digits in a floating point literal to represent the value as correct as possible?

Question

For example, using IEEE-754 32-bit binary floating points, let's represent the value of 1 / 3. It cannot be done exactly, but 0x3eaaaaab produces the closest value to 1 / 3. You might want to write the value in decimal, and let the compiler to convert the decimal literal to a binary floating point number.

0.333333f    -> 0x3eaaaa9f (0.333332986)
0.3333333f   -> 0x3eaaaaaa (0.333333313)
0.33333333f  -> 0x3eaaaaab (0.333333343)
0.333333333f -> 0x3eaaaaab (0.333333343)

You can see that 8 (significant) decimal digits is enough to represent the value as correct as possible (closest to the actual value).

I tested with π and e (base of the natural log), and both needed 8 decimal digits for the correctest.

3.14159f    -> 0x40490fd0 (3.14159012)
3.141593f   -> 0x40490fdc (3.14159298)
3.1415927f  -> 0x40490fdb (3.14159274)
3.14159265f -> 0x40490fdb (3.14159274)

2.71828f    -> 0x402df84d (2.71828008)
2.718282f   -> 0x402df855 (2.71828198)
2.7182818f  -> 0x402df854 (2.71828175)
2.71828183f -> 0x402df854 (2.71828175)

However, √2 appears to need 9 digits.

1.41421f     -> 0x3fb504d5 (1.41420996)
1.414214f    -> 0x3fb504f7 (1.41421402)
1.4142136f   -> 0x3fb504f4 (1.41421366)
1.41421356f  -> 0x3fb504f3 (1.41421354)
1.414213562f -> 0x3fb504f3 (1.41421354)

https://godbolt.org/z/W5vEcs695

Looking at these results, it's probably right that a decimal floating-point literal with 9 significant digits is sufficient to produce a most correct 32-bit binary floating point value, and in practice something like 12~15 digits would work for sure if space for storing the extra digits doesn't matter.

But I'm interested in the math behind it. How can one be sure that 9 digits is enough in this case? What about double or even arbitrary precision, is there a simple formula to derive the number of digits needed?

The current answers and the links in the comments confirm that 9 digits is enough for most cases, but I've found a counterexample where 9 digits is not enough. In fact, infinite precision in the decimal format is required to be always correctly converted (rounded to the closest) to some binary floating point format (IEEE-754 binary32 floats for the discussion).

8388609.499 represented with 9 significant decimal digits is 8388609.50. This number converted to float has the value of 8388610. On the other hand, the number represented with 10 or more digits will always preserve the original value, and this number converted to float has the value 8388609.

You can see 8388609.499 needs more than 9 digits to be most accurately converted to float. There are infinitely many such numbers, placed very close to the half point of two representable values in the binary float format.

If you multiply the number of bits in the significand by log(10)2, which is 0.30103 that gives you the number of significant decimal digits that can be represented. But the number of accurate decimal *places* depends on the integral part of the value. So for a `float` having about 7 digits accuracy, any value > 9999999 has zero accuracy in its decimal places. If you want the mentioned 12~15 digits accuracy, use `double`. — Weather Vane, Apr 25 '22 at 08:08
@WeatherVane IEEE-754 32-bit float has 23 bits in the significand, and `23 * log10(2) = 6.9236899`. The integer part is `6`. Then, how is `9` derived? — xiver77, Apr 25 '22 at 08:12
What I meant is that the integral part of `12345.6789` is `12345`. The number of accurate decimal places depends on that, as well as the number of bits in the significand. — Weather Vane, Apr 25 '22 at 08:13
@WeatherVane I don't want 12~15 digits of accuracy, I mean 12~15 digits in a decimal floating-point literal will be more than enough to be converted to the most correct binary 32-bit floating point value. — xiver77, Apr 25 '22 at 08:14
The most correct value of an irrational or recurring vlaue, will be held by the largest type you have available. Never use `float`, ever (it isn't 1980 any more) unless you have a very good reason why you need to use `float`. — Weather Vane, Apr 25 '22 at 08:16
@WeatherVane I appreciate your interest to this question, but could you please read the contents of this post carefully before leaving a comment? BTW `float` is still used heavily in scientific computing or computer graphics where extra precision is unnecessary in most cases. — xiver77, Apr 25 '22 at 08:20
As I said, the most digits available is what you should use. For every computation you do, in a chain of them, you lose even more accuracy. It doesn't matter what the value is, some aren't "more accuarate than others" except perhaps in the last decimal digit you use. There is no need to cherry-pick the number of places you use: use the most available. — Weather Vane, Apr 25 '22 at 08:21
@WeatherVane And I asked **what** is the most digits available, and **how** can it be derived? — xiver77, Apr 25 '22 at 08:23
And I said, the most available is the number of bits * 0.30103, and that the number of decimal places also depends on the magnitude of the integral part of the value. — Weather Vane, Apr 25 '22 at 08:24
@WeatherVane That is `6.923...`, but the result from my tests is `9`. There is a visible difference. — xiver77, Apr 25 '22 at 08:25
Note that there is an implicit 24th bit set to one in the mantissa of a IEEE 754 32-bit `float`. Use `%a` if you want to round-trip. — Bob__, Apr 25 '22 at 08:30
Do you start with a real number and want to know how many decimal digits of that you need to have a number which is rounded to the closest floating point number? Or do you start with a floating point number and want the minimal number of decimal digits to represent it (in the worst case)? — chtz, Apr 25 '22 at 08:36
@chtz I think it's closer to the first, but also thinking about irrationals. You might want to set a floating point constant from a known series of decimal digits. Then, how many digits do you need to put there for the most accurate representation? — xiver77, Apr 25 '22 at 08:39
@chtz An answer here let me know `FLT_DECIMAL_DIG`, which apparently is `9` on my machine. Maybe this is what I want, but I'm wondering because I'm still not sure how the explanation for that constant written in the standard matches the case I've explained in this question. (*number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value*) — xiver77, Apr 25 '22 at 08:43
@WeatherVane No I don't get `9` for 32-bit floats. IEEE-754 32-bit float (`float` on most machines) has a 23-bit significand, so `23 * log10(2) = 6.923...` is the result using the formula you suggested. — xiver77, Apr 25 '22 at 08:54
Bruce Dawson wrote an excellent article about this and much more on his tech blog _Random Ascii_. I recommend you have a look at it and the many other excellent investigations he did. [Float Precision–From Zero to 100+ Digits](https://randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-to-100-digits-2/) — kvantour, Apr 25 '22 at 10:15
@kvantour Thanks for the link to the post. That again confirms that 9 decimal digits is enough to uniquely identify a `float`. But the case in this question is slightly different from uniquely identifying a `float`. See my comments to the current answer to this question. — xiver77, Apr 25 '22 at 10:28
it's similar to double where depending on the trip order that the round-trip precision is 15 or 17 digits: [How do we need 17 significant decimal digits to identify an arbitrary double-precision floating-point number?](https://stackoverflow.com/q/68784030/995714). You can see the formula here: [Number of Digits Required For Round-Trip Conversions](https://www.exploringbinary.com/number-of-digits-required-for-round-trip-conversions/) — phuclv, Apr 25 '22 at 11:02
[According to Wikipedia](https://en.wikipedia.org/wiki/IEEE_754#Character_representation), the answer is indeed 9, and there's a general formula there for any number of bits: `1 + ceil(nbits * log10(2))`, which gives 9 for the 24 significant bits of an IEEE-754 `float`. — Steve Summit, Apr 25 '22 at 11:19
So the answer is infinity. Is that all you wanted to know? Is there any other specific question being asked? — Eric Postpischil, Apr 25 '22 at 16:07
@EricPostpischil For this question, yes, I'm happy to know that for most numbers the answer is `9`, but in theory this could go up to infinity. — xiver77, Apr 25 '22 at 16:23
This point may have been made, but to be clear: the reason the answer you're getting of "9, but maybe infinity" is different from the FLT_DECIMAL_DIG answer of "9" is that they're answering different questions. FLT_DECIMAL_DIG tells you how many digits you need to guarantee a round trip from an IEEE-754 `float`, to a string, and back to a `float` again. But you're asking, I think, for the minimum digits required to convert from a value you have in mind, via a string (specifically a floating-point constant in source code) to an IEEE-754 `float` best representing the value you had in mind. — Steve Summit, Apr 25 '22 at 18:27

score 8 · Answer 1 · edited Apr 25 '22 at 10:04

8

I think you are looking for *_DECIMAL_DIG constants. C standard provides small explanation and formula on how they are calculated (N2176 C17 draft):

5.2.4.2.2 Characteristics of floating types <float.h>
The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign:

...
number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,
p log10 b        if b is a power of 10
⌈1 + p log10 b⌉  otherwise


FLT_DECIMAL_DIG  6
DBL_DECIMAL_DIG  10
LDBL_DECIMAL_DIG 10

With IEEE-754 32-bit float b = FLT_RADIX = 2 and p = FLT_MANT_DIG = 24, result is FLT_DECIMAL_DIG = ⌈1 + 24 log10 2⌉ = 9. (⌈x⌉=ceil(x)) is ceiling function: round result up)

edited Apr 25 '22 at 10:04

kvantour

25,269
4
47
72

answered Apr 25 '22 at 08:25

user694733

15,208
2
42
68

That is `6` for float, but the *constant* I want is at least `9` apparent from the tests. – xiver77 Apr 25 '22 at 08:26
1

@xiver77 See paragraph above; those are minimum values. Your machine should report 9. – user694733 Apr 25 '22 at 08:27
Oh, yes it is 9. Interesting! I'll have a look! – xiver77 Apr 25 '22 at 08:29
Nothing about your answer, but the punctuation for that sentence in the standard made me waste 10 minutes. "any floating-point number with *p radix b digits*" here, it looks like "p radix, b digits". It should be "p radix-b digits" or "p, radix b digits" to avoid confusion. – xiver77 Apr 25 '22 at 09:05
1

@xiver77 Yes, I agree it is confusingly written. I hope the calculation I added at the end of the answer helps future readers a little bit. – user694733 Apr 25 '22 at 09:10
Your answer was very helpful and solved 97% of my question, but I still can't resolve the thought that there might be an extreme corner case for some value `x` whose exact value is right between two representable values of `float`. Any value of `float`, `f`, can be converted to a decimal form `d` of 9 significant digits and get converted back without loss. Here, `d` would be very close to `f`. – xiver77 Apr 25 '22 at 09:54
Let's say `x` is also close to `f`. Then, `x` is also close to `f + some very small amount` because it is right between those two values. Wouldn't `x` represented by 9 decimal digits possibly be mapped to the other value very close to `f`? – xiver77 Apr 25 '22 at 09:54
You can see that the base value (like `x`) and `d` has some difference in the examples (0.333333333 | 0.333333343, 3.14159265 | 3.14159274). I'm thinking about the extreme case where the difference between the two values is the maximum. – xiver77 Apr 25 '22 at 10:02
1

@xiver77 When `x` is not exactly representable in `float`, it is converted to either of the 2 nearest values depending on the rounding rules. It is impossible to get `x` back because data has been lost. After that conversion, C standard promises that with using `FLT_DECIMAL_DIG` you can convert value to decimal number and back without further loss. You can improve situation by using data type with more precision, but even then, there will always be some numbers where conversion between binary and decimal will require infinite precision. – user694733 Apr 25 '22 at 10:37
"there will always be some numbers where conversion between binary and decimal will require infinite precision", that was the incomplete but convincing conclusion I also reached. Thanks for the confirmation. – xiver77 Apr 25 '22 at 14:26
Thinking about IEEE binary32 floats, where `2^23 + 0.5 = 8388608.5` is right between two representable values, `8388608.4999...0` with *a lot* but finite number of `9`s will be rounded incorrectly when converted to binary unless the decimal representation continues to the point where the series of `9`s end. So this is one example where 9 digits is not enough, if I didn't miss something. – xiver77 Apr 25 '22 at 14:50
But I was wrong, both GCC and Clang rounds even `8388608.5` to `8388608`. So it seems the way the compilers round numbers is a bit different from how humans do. Still, I think it's possible to make a similar example. – xiver77 Apr 25 '22 at 14:53
Finally, `8388609.4999...f` is a working example (`f` is needed to avoid the rounding error from `double` to `float`). With `n` decimal digits in the fractional part, any decimal representation with less than `n` digits in the fractional part will be rounded to `8388609.5` which is then rounded to `8388610`, while the correctly rounded result is `8388609`. This number will require at least `n + 7` decimal digits for correct conversion to binary32. – xiver77 Apr 25 '22 at 15:20

chux - Reinstate Monica · Accepted Answer · 2022-04-26T17:14:54.103

What about double or even arbitrary precision, is there a simple formula to derive the number of digits needed?>

From C17 § 5.2.4.2.2 11 FLT_DECIMAL_DIG, DBL_DECIMAL_DIG, LDBL_DECIMAL_DIG

number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,

p_max log₁₀ b: if b is a power of 10
1 + p_max log₁₀ b: otherwise

But I'm interested in the math behind it. How can one be sure that 9 digits is enough in this case?

Each range of binary floating point like [1.0 ... 2.0), [128.0 ... 256.0), [0.125 ... 0.5) contains 2^{p - 1} values uniformly distributed. e.g. With float, p = 24.

Each range of a decade of decimal text with n significant digits in exponential notation like [1.0 ... 9.999...), [100.0f ... 999.999...), [0.001 ... 0.00999...) contains 10^{n - 1} values uniformly distributed.

Example: common float:
When p is 24 with 2²⁴ combinations, n must at least 8 to form the 16,777,216 combinations to distinctly round-trip float to decimal text to float. As the end-points of two decimal ranges above may exist well within that set of 2²⁴, the larger decimal values are spaced out further apart. This necessitates a +1 decimal digit.

Example:

Consider the 2 adjacent float values

10.000009_5367431640625
10.000010_49041748046875

Both convert to 8 significant digits decimal text "10.000010". 8 is not enough.

9 is always enough as we do not need more than 167,772,160 to distinguish 16,777,216 float values.

OP also asks about 8388609.499. (Let us only consider float for simplicity.)

That value is nearly half-way between 2 float values.

8388609.0f  // Nearest lower float value
8388609.499 // OP's constant as code
8388610.0f  // Nearest upper float value

OP reports: "You can see 8388609.499 needs more than 9 digits to be most accurately converted to float."

And let us review the title "What is the minimum number of significant decimal digits in a floating point literal^*1 to represent the value as correct as possible?"

This new question part emphasizes that the value in question is the value of the source code 8388609.499 and not the floating point constant it becomes in emitted code: 8388608.0f.

If we consider the value to be the value of the floating point constant, only up to 9 significant decimal digits are needed to define the floating point constant 8388608.0f. 8388608.49, as source code is sufficient.

But to get the closest floating point constant based on some number as code yes indeed could take many digits.

Consider the typical smallest float, FLT_TRUE_MIN with the exact decimal value of :

0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125

Half way between that and 0.0 is 0.000..(~39 more zeroes)..0007006..(~ 100 more digits)..15625.

It that last digit was 6 or 4, the closest float would be FLT_TRUE_MIN or 0.0f respectively. So now we have a case where 109 significant digits are "needed" to select between 2 possible float.

To forego us going over the cliffs of insanity, IEEE-758 has already addressed this.

The number of significant decimal digits a translation (compiler) must examine to be compliant with that spec (not necessarily the C spec) is far more limited, even if the extra digits could translate to another FP value.

IIRC, it is in effect FLT_DECIMAL_DIG + 3. So for a common float, as little as 9 + 3 significant decimal digits may be examined.

[Edit]

correct rounding is only guaranteed for the number of decimal digits required plus 3 for the largest supported binary format.

^*1 C does not define: floating point literal, but does define floating point constant, so that term is used.

Thanks for the clear explanation. Thinking it with the number of possible combinations makes the problem easier to identify. Could you also have a look at the bottom part of the OP from the recent edit? — xiver77, Apr 25 '22 at 16:25
I'm accepting your answer because you already explained much more than I asked, but I'm very interested in why "the number of significant decimal digits a compiler must examine to be compliant with that spec" is `FLT_DECIMAL_DIG + 3`. Please do explain this part whenever you feel like and leave a reply so I can get a ping. — xiver77, Apr 25 '22 at 19:46
Both GCC and Clang seems to examine over 1000 decimal digits in practice (https://godbolt.org/z/e9dz6sjf4), but what the spec defines is still interesting to me. — xiver77, Apr 25 '22 at 20:11
@xiver77 Added reference. Maybe later add the 754 spec quote - do not have electronics access right now. — chux - Reinstate Monica, Apr 26 '22 at 17:16

score 3 · Answer 3 · edited Apr 25 '22 at 11:28

What is the minimum number of significant decimal digits in a floating point literal to represent the value as correct as possible?

There is no guarantee from the C standard that any number of decimal digits in a floating-point literal will produce the nearest value actually representable in the floating-point format. In discussing floating-point literals, C 2018 6.4.4.2 3 says:

… For decimal floating constants, … the result is either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner…

For quality, C implementations should correctly round floating-point literals to the nearest representable value, with ties going to the choice with the even low digit. In that case, the FLT_DECIMAL_DIG, DBL_DECIMAL_DIG, and LDBL_DECIMAL_DIG values defined in <float.h> provide numbers of digits that always suffice to uniquely identify a representable value.

How can one be sure that 9 digits is enough in this case?

You need statements to this effect in the compiler documentation, such as statements that it provides correct rounding for floating-point literals and that it uses IEEE-754 binary32 (a.k.a. “single precision”) for float (or some other format that would only require nine significant digits to uniquely identify all representable values).

What about double or even arbitrary precision, is there a simple formula to derive the number of digits needed?

The C standard indicates the constants above are calculated as p log₁₀ b if b is a power of ten and ceil(1 + p log₁₀ b) otherwise, where p is the number of digits in the floating-point format and b is the base used in the format. These always suffice, but the latter is not always necessary. The latter provides the number of digits needed if the exponent range were unbounded; its “1 +” covers all possible allowances for how the powers of b interact with the powers of 10, in a sense. But any floating-point format has a finite exponent range, and, for some choices of exponent range, ceil(p log₁₀ b) would suffice instead of ceil(1 + p log₁₀ b). There is no simple formula for this. It does not occur with the standard IEEE-754 formats and can be neglected in practice.

There are some numbers that need more than `9` digits for correct conversion to IEEE binary32. `8388609.4999...f` with a lot but finite number of `9`s is one example (`f` is needed to avoid the rounding error from `double` to `float`). With `n` decimal digits in the fractional part, any decimal representation with less than `n` digits in the fractional part will be rounded to `8388609.5` which is then rounded to `8388610`, while the correctly rounded result is `8388609`. This number will require at least `n + 7` decimal digits for correct conversion to binary32. — xiver77, Apr 25 '22 at 15:16
@xiver77: It is not clear what you mean. You discuss some sort of double rounding, apparently from some number 8388609.4999…9 to some smaller number of decimal digits and then to `float`. You do not **need** more than 9 digits to get the `float` value you want for 888609.4999…9, because you **can** get that `float` value by using `8388609`, which has just seven digits. You are asking some other question… — Eric Postpischil, Apr 25 '22 at 15:41
… Maybe it is this: What is the minimum number *d* such that, for any real number *x* within a floating-point range, rounding *x* to a decimal numeral *D* with *d* significant digits and then rounding *D* to the floating-point format produces the same result as rounding *x* to the floating-point format? The answer to that question is there is no such finite number *d*. This is the Table Maker’s Dilemma; there is always a rounding-point between two representable numbers where the decision to round to one versus the other changes, and there are numbers arbitrarily close to that point. — Eric Postpischil, Apr 25 '22 at 15:43
Please see the added sentences at the bottom of the OP. I hope that clarifies. — xiver77, Apr 25 '22 at 15:44

What is the minimum number of significant decimal digits in a floating point literal to represent the value as correct as possible?

3 Answers3

5.2.4.2.2 Characteristics of floating types <float.h>