6

Consider decimal representations of the form d1.d2d3d4d5...dnExxx where xxx is an arbitrary exponent and both d1 and dn are nonzero.

Is the maximum n known such that there exists a decimal representation d1.d2d3d4d5...dnExxx such that the interval (d1.d2d3d4d5...dnExxx, d1.d2d3d4d5...((dn)+1)Exxx) contains an IEEE 754 double?

n should be at least 17. The question is how much above 17.

This number n has something to do with the number of digits that it is enough to consider in a decimal-to-double conversion such as strtod(). I looked at the source code for David M. Gay's implementation hoping to find an answer there. There is an allusion to “40” but it is unclear whether this is a consequence of a sound mathematical result or just a statistically safe bound. Also the comment about “truncating” makes it sound like 0.5000000000000000000000000000000000000000000000000001 may be converted to 0.5 in round-upwards mode.

Musl's implementation seems to read approximately 125*9 digits, which is a lot. Then it switches to “sticky” mode:

if (c!='0') x[KMAX-4] |= 1;

Finally, how does the answer change when substituting “contains an IEEE 754 double” with “contains the midpoint of two consecutive IEEE 754 doubles”?

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 1
    I'm not sure I understand the point. For example, `2^(-1074)` has 751 significant decimal digits, thus there is a decimal representation `d1.d2...d750E-324` satisfying the condition (you can get longer ones, but not much). But you only need a handful of these digits to determine the closest IEEE754 `double`. – Daniel Fischer Jun 21 '13 at 22:13
  • 1
    @DanielFischer But 2^(-1074) is not in the exclusive interval (d1.d2...d750E-324, d1.d2...(d750+1)E-324). By the way (d750+1) is a slight abuse of notation if d750 is a “9”. This abuse is also in my question but the alternative (d1.d2...d750E-324, (d1.d2...d750E-324 + 1E-1074)) is confusing too. I may even get the exponent wrong. – Pascal Cuoq Jun 21 '13 at 22:19
  • 1
    I used one digit less than the exact representation has, so it is in the open interval. – Daniel Fischer Jun 21 '13 at 22:20
  • 1
    I guess your problem is that you want/need to _efficiently_ parse in rounding modes other than "to nearest", right? – Daniel Fischer Jun 21 '13 at 22:21
  • @DanielFischer I tried to write the question so that the first part, up to the last sentence, was about the conversion in directed modes and the last sentence about the conversion in round-to-nearest. Now it seems to me that the two answers differ widely (750+ and 17) but I still think I somehow managed to ask the right questions. – Pascal Cuoq Jun 21 '13 at 22:45
  • @aka.nice After 760 or so digits, you can't stop, because of `0.500...001` types of representations, but you can go to sticky mode: it only matters whether there remains a non-zero digit in the tail or not. This is what Musl does. – Pascal Cuoq Jun 22 '13 at 12:46
  • @PascalCuoq yes, I realized that just after posting. Note that n depends on binary exponent, and ranges from 309 for emax to 767 for emin, passing thru a minimum of 16 for (2^53-1). – aka.nice Jun 22 '13 at 14:25

1 Answers1

6

When you have a subnormal number with odd significand, that is, an odd multiple of 2^(-1074), you have a number whose last nonzero digit in the decimal representation is the 1074th after the decimal point. You then have around 300 zeros directly following the decimal point, so the number has around 750-770 significant decimal digits. The smallest positive subnormal, 2^(-1074) has 751 significant digits, and the largest positive subnormal, (2^52-1)*2^(-1074) has 767 significant digits (I think that is the maximum).

So there is at least one sequence d1, ..., d766 of decimal digits such that there is an IEEE754 double in the open interval

(d1.d2...d766E-308, d1.d2...(d766 + 1)E-308)

The answer does not change much if we consider "contains the midpoint of two consecutive IEEE754 doubles", since subnormal doubles have all roughly the same amount of significant decimal digits, and the midpoint of two consecutive such too.

In the worst case, the entire digit sequence must be consumed (consider "0.5000000...0001" with arbitrarily many zeros before the final 1 that determines that the result shall be 0.5 + 0.5^53 and not 0.5 when rounding away from zero or up).

However, there are only

floor(DBL_MANT_DIG * log 2 / log 10) + 2 = 17

significant decimal digits necessary to distinguish between different double values, so a relatively easy, albeit probably not very efficient, method of parsing would be to parse the first (at least 17) digits (and the exponent) to the closest double, and compare the input string with the exact representation of that double value (and its neighbour).

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • Thanks for your answer. The number n I was trying to determine was the number of decimal digits that must be parsed in normal mode before it is safe to go to “sticky” mode. Arbitrarily long decimal sequences such as 0.50000...00001 can be built but after a while it only matters whether there remains a non-zero digit or not, and the question I wanted to answer was “after how many digits, exactly?”. I still need to ponder the difference between directed and to-nearest modes, but you have helped me greatly. – Pascal Cuoq Jun 21 '13 at 23:00
  • 1
    I don't think I buy your argument that 17 digits is enough. Look at 1 + 45/2^53 and 1 + 46/2^53. `1.00000000000000505` rounds down but `1.000000000000005059` rounds up. The difference is in the 18th decimal place. – tmyklebu Jun 21 '13 at 23:03
  • 1
    @tmyklebu There is no IEEE754 `double` with the value `1 + 45/2^53`. That needs 54 bits of precision to be represented. The next smaller after `1 + 46/2^53` is `1 + 44/2^53 = 1.000000000000004884981308350688777863979339599609375`. – Daniel Fischer Jun 21 '13 at 23:06
  • Whoops, yeah. Regardless, it seems like you could write down arbitrarily many digits of a midpoint; where should `1.0000000000000001110223024625156540423631668090820312` round? What about `1.0000000000000001110223024625156540423631668090820313`? What if you do the same between the smallest and second-smallest subnormals as you had done? – tmyklebu Jun 21 '13 at 23:13
  • @tmyklebu That depends on the rounding mode. With round-to-nearest, to `1.0` and `1.0000000000000002220446049250313080847263336181640625` respectively. With round-away-from-zero or round-up, both to the latter. If you write down the exact midpoint, and use round-to-nearest, ties are usually broken by choosing the one with last bit 0. – Daniel Fischer Jun 21 '13 at 23:16
  • @DanielFischer: Right, and we needed to parse 50-odd digits to find out. – tmyklebu Jun 21 '13 at 23:18
  • 1
    I think there is a difference between “this double was pretty-printed in decimal to 17 significant digits rounded to nearest, so if I parse it (rounding to nearest), I am sure to get the original double” and “I only need to read 17 of this arbitrary sequence of decimal digits to know the nearest double”. The first works, but the second doesn't, does it? Still trying to understand, thinking out loud... – Pascal Cuoq Jun 21 '13 at 23:19
  • @tmyklebu Oops, right. It was only for the `double` -> decimal -> `double` round-tripping that 17 digits were always sufficient. Thanks for the heads-up. – Daniel Fischer Jun 21 '13 at 23:19
  • 1
    @PascalCuoq Right, that was exactly it. – Daniel Fischer Jun 21 '13 at 23:21
  • For mid-point, just consider significands of 54 bits and least bit to 2^-1075 in case of subnormal, that is exactly one more decimal digit is required – aka.nice Jun 22 '13 at 13:58
  • 2
    Consider (2^-1022-2^1074) - the predecessor of twice fminNormalized - it requires 767 digits – aka.nice Jun 22 '13 at 14:23