Appropriate scale for converting via BigDecimal to floating point

Question

I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point. This can be done straightforwardly via BigDecimal:

return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();

but this requires a value for the scale parameter when dividing the decimal numbers. I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.

What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?

As a decimal, the _exact_ value of common floating points [need 100s](https://codereview.stackexchange.com/questions/212490/function-to-print-a-double-exactly) of decimal digits to represent. Typically only the first 17 are of interest, but your needs may vary. — chux - Reinstate Monica, Oct 07 '19 at 22:31
@chux: Per my comments on the current answer, 17 is insufficient, and I think 751 digits are needed. 1152921504606847105 Is an example where 19 are needed. — Eric Postpischil, Oct 08 '19 at 11:05
Actually, maybe infinitely many digits are needed (which implies something is wrong with my prior reasoning about the midpoints). Let x be 2^60 + 2^7 + e. If e is 0, converting this to IEEE-754 binary64 should round down, to 2^60 (2^60 + 2^7 is tied between 2^60 and 2^60 + 2^8, so ties-to-even rounds to 2^60). But if it is positive, it should round up, to 2^60 + 2^8. But if e is 10^−2000 and we round 2^60 + 2^7 + e to 1000 decimal digits, we get 2^60 + 2^7, which rounds down. Clearly e can be 10^−p for any p, so arbitrarily many digits are needed. — Eric Postpischil, Oct 08 '19 at 11:33
Doing the decimal division with rounding away from zero, instead of to nearest, could solve that arbitrary precision issue. But you would still need enough digits to distinguish the midpoints of binary64 pairs, putting us back at 751. Still, that is a good improvement over infinity. — Eric Postpischil, Oct 08 '19 at 11:35
Why can't you simply do the calculation with doubles. If the result should be a double there is not much point in doing the division with big decimals. — Henry, Oct 11 '19 at 01:38

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

Introduction

No finite precision suffices.

The problem posed in the question is equivalent to:

What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x)?

To see this is equivalent, observe that the BigDecimal divide shown in the question returns num/div to a selected number of decimal places. The question then asks whether increasing that number of decimal places could increase the accuracy of the result. Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved. Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.

Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices. For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float). I give a proof using the IEEE-754 binary64 format, which Java uses for Double, but the proof applies to any binary floating-point format by changing the 2⁵² used below to 2^w-1, where w is the number of bits in the significand.

Proof

One of the parameters to a BigDecimal division is the rounding method. Java’s BigDecimal has several rounding methods. We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN. Arguments for the others are analogous to those below, by using various symmetries.

In the following, suppose we convert to decimal using any large precision p. That is, p is the number of decimal digits in the result of the conversion.

Let m be the rational number 2⁵²+1+½−10^−p. The two binary64 numbers neighboring m are 2⁵²+1 and 2⁵²+2. m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point.

In decimal, m is 4503599627370497.4999…, where there are p−1 trailing 9s. When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 2⁵²+1+½. (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of .9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.)

2⁵²+1+½ is equally close to the neighboring binary64 numbers 2⁵²+1 and 2⁵²+2, so the round-to-nearest-ties-to-even method produces 2⁵²+2.

Thus, the result is 2⁵²+2, which is not the binary64 value closest to m.

Therefore, no finite precision p suffices to round all rational numbers correctly.

Appropriate scale for converting via BigDecimal to floating point

1 Answers1

Introduction

Proof

Linked