3

I am working on software that, among other things, converts measured numbers between text and internal (double) representation. A necessary part of the process is to produce text representations with the correct decimal precision based on the statistical uncertainty of the measurement. The needed precision varies with the number, and the least-significant digit in it can be anywhere, including left of the (decimal) units place.

Correct rounding is essential for this process, where "correct" means according to the floating-point rounding mode in effect at the time, or at least in a well-defined rounding mode. As such, I need to be careful about (read: avoid) performing intermediate arithmetic on the numbers being handled, because rounding can be sensitive even to the least-significant bit in the internal representation of a number.

I think I can do almost all the needed formatting reasonably well with the printf family of functions if I first compute the number of significant digits in the required representation:

sprintf(buffer, "%.*e", num_sig_figs - 1, number);

There is one class of corner cases that has so far defeated me, however: the one where the most significant (decimal) digit in the measured number is one place right of the least significant digit of the desired-precision representation. In that case, rounding should yield the least (and only) significant digit in the desired result as either 0 or 1, but I haven't been able to devise a way to perform the rounding in a portable(*) way without risk of changing the result. This is similar to what the MPFR function mpfr_prec_round() could do, except that it works in binary precision, whereas I need to use decimal precision.

For example, in the default rounding mode (round-to-nearest with ties rounded to even):

  • 0.5 expressed to unit (10^0) precision should be "0" or "0e+00"
  • 654 expressed to thousands (10^3) precision should be "1e+03"
  • 0.03125 expressed to tenths (10^-1) precision should be "0" or "0e-01" or even "0e+00"

(*) "Portable" here means that the code accurately expresses the computation in standard, portable C99 (or better, C90). It is understood that the actual result may depend on machine details, and it should depend (and be consistent with) the floating-point rounding mode in effect.

What options do I have?

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • rounding of decimal numbers and doubles doesn't go well together... (decimal fractions are periodic in binary) try to use some fixed point data type... – V-X May 20 '13 at 17:20
  • 1
    It would seem these would be handled with code such as `if (x < precision) if (x <= .5*precision) printf("0") else printf("%.0e", precision) else /* Not one of these cases, do other stuff. */…`. Is there a case where what you want differs from what this code would produce? (There is a problem with how `precision` is rounded, since .1 cannot be represented exactly, so you might have some issues exactly at the boundary. Also, negative `x` are not handled, but that is straightforward.) – Eric Postpischil May 20 '13 at 17:38
  • Please explain better the case you think is problematic with your `sprintf` method. I think it is the correct method and I'm confused as to what problem you're having. – R.. GitHub STOP HELPING ICE May 20 '13 at 18:00
  • The sprintf method does not work for the problem case because the sprintf bases its result on the most significant digit of the argument, which (in that case) is not a significant digit of the desired result. For example, `sprintf(buffer, "%.0e" 0.5)` yields "5e-01" instead of "0e+00". Only in the case where the argument rounds up should you get a nonzero result. – John Bollinger May 20 '13 at 21:01
  • I realize that decimal rounding does not harmonize well with binary floating point, and that most decimal fractions have infinite binary representations. Indeed, my internal representation is a somewhat clunky arbitrary-precision decimal format. However, I need the ability to accept numbers into the system in standard floating point format, and I must convert them correctly when I do. – John Bollinger May 20 '13 at 21:10
  • Manual rounding based on comparing with .5 * precision is my present approach, but it has the following limitations: (i) it is hard-coded for one particular rounding mode, rather than using whatever the current FP rounding mode happens to be, and (ii) I am concerned that it will yield the wrong answer in a few cases on account of the (0.5 * precision) cutoff computation not yielding exactly the correct value -- especially when 'precision' is less than 1. – John Bollinger May 20 '13 at 21:13

1 Answers1

2

One simple (albeit fairly inefficient) approach that will always work is to print the full exact decimal value as a string, then do your rounding in decimal manually. This can be achieved by something like

snprintf(buf, sizeof buf, "%.*f", DBL_MANT_DIG-DBL_MIN_EXP, x);

I hope I got that precision right. The idea is that each additional mantissa bit, and each additional negative power of two, takes up one extra decimal place.

You avoid the issue of double rounding by the fact that the decimal value obtained is exact.

Note that double rounding only matters in the default rounding mode (nearest). In other modes, double rounding obtains the same result that would be obtained by a single rounding step, so you can take lots of shortcuts if you like.

There are probably better solutions which I'll post later if I think of them. Note that the above solution will only work on high-quality implementations where the printf family of functions is capable of printing exact decimals. It will fail horribly, for example, on MSVCRT and other low-quality implementations, even some conforming ones.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Hmm. I do work with decimal strings internally, with all the attending inefficiency, so I'm not shy about relying on such an approach here. I'm developing against glibc 2.5, which I think is a decent implementation. I am prepared to turn up my nose at non-conforming implementations, but it is worrisome that conforming ones might still get this wrong. Nevertheless, I'll see whether I can get something along these lines working. Stay tuned. – John Bollinger May 20 '13 at 21:29
  • If you don't like relying on the implementation's `printf` implementation to support exact printing as decimal, you could write your own. All it takes is a very rudimentary bignum working in a power-of-10 base (base-1000000000 is convenient) with just two operations: multiplication and division by powers of two. The most concise (but somewhat cryptic) implementation I'm aware of is mine here: http://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c (see the `fmt_fp` function). – R.. GitHub STOP HELPING ICE May 21 '13 at 03:09
  • `DBL_MANT_DIG-DBL_MIN_EXP` seems like the right idea, but `0.5` takes four characters to represent in decimal including the terminating zero, and it only accounts for 1 of “mantissa bit or negative power of two”. The minimum safe value is probably `DBL_MANT_DIG-DBL_MIN_EXP+3`. – Pascal Cuoq May 22 '13 at 06:27
  • 1
    @PascalCuoq: And `0.5` only takes one place of precision in the `%.*f` sense of precision. `DBL_MANT_DIG-DBL_MIN_EXP` was not the buffer size but the precision. Your buffer needs to be considerably larger: large enough for the precision, the decimal point, and the integer portion (which could be up to about `DBL_MAX_EXP/log2(10)` digits). – R.. GitHub STOP HELPING ICE May 22 '13 at 14:18
  • DBL_MANT_DIG is the number of base-FLT_RADIX digits of mantissa, so it's not quite the right thing (unless you have a decimal FP implementation). Though it does go together with DBL_MIN_EXP, which is a base-FLT_RADIX exponent. In C99 you can use `sprintf(buffer, "%.*e", DECIMAL_DIG - 1, number)`, as long as buffer can accommodate DECIMAL_DIG digits plus decorations including leading sign, decimal point, and exponent. C90 does not require DECIMAL_DIG to be defined, but it can be computed as `ceil(1 + DBL_MANT_DIG * log10(FLT_RADIX))`. – John Bollinger May 23 '13 at 14:38
  • @R..: I did get my own float-to-decimal converter working, along the same lines as your code. In so doing, I decided that the method I first tried to use to convert from float to bignum was flawed because its reliance on FP subtracion introduced a significant amount of rounding error. It looks like your code uses a near identical method, so you might want to evaluate that issue for yourself. I ended up instead scaling the fraction part up to an integer by (FP) multiplication by a power of 2, then moving it to a uint64_t and using integer arithmetic from there. – John Bollinger May 23 '13 at 14:56
  • I'm going to go accept this answer. I'd also like to share a few additional things I learned, mainly that C90 doesn't provide all the pieces needed to do exactly what I wanted. In particular, it does not provide a way to query or manipulate the FP rounding mode, so if you stick with only C90-defined features then you have to hard code a particular mode. – John Bollinger May 23 '13 at 15:15
  • @JohnBollinger: `DBL_MANT_DIG` is exactly the right thing. Starting with 1.0, every time you divide by 2, you need one additional decimal place to represent the result in decimal. Thus for 52 fractional bits, you need 52 decimal places. Also, printing `DECIMAL_DIG` places is **not sufficient** for the algorithm I described; it will result in rounding, and therefore your final result will be double-rounded. – R.. GitHub STOP HELPING ICE May 23 '13 at 17:55
  • As for my algorithm, the the subtraction performed is always exact; there is no rounding. Assuming `y` is non-negative and less than `INT_MAX`, `y-(int)y` is always exact. The step that requires some nontrivial proof of exactness is the multiplication by 1000000000. In principle this multiplication would be lossy (1000000000 has 21 bits in its mantissa), but the initial value of `y` was scaled to have 28 integral bits, which are removed from the mantissa during the subtraction. After the first iteration, each additional iteration adds at most 21 bits to `y` and removes at least 21 bits from y. – R.. GitHub STOP HELPING ICE May 23 '13 at 18:03
  • @R..: you are correct, I see, though I'm still trying to wrap my head around why. Each decimal digit _should_ be worth a little over three binary digits, but it all goes pear-shaped on the fractional side. I suppose it's a result of each decimal fraction digit contributing to multiple different binary fraction digits. There's something else I learned! – John Bollinger May 23 '13 at 20:05
  • y-(int)y loses significant digits. It is exact if the extra digits of mantissa added when the result is re-normalized are all zero, but is it safe to assume that will be so? – John Bollinger May 23 '13 at 20:13
  • The only time `y-(int)y` can ever be inexact is when the conversion itself is invalid due to overflow of the range of `int`. I'm having a hard time understanding what your concern is, but it looks to me like you think floating point "renormalization" can introduce random junk bits in the bottom of the mantissa. Per plain ISO C, *anything is possible* (even 2.0+2.0==5.0!) since the standard makes no requirements on accuracy of results, but IEEE arithmetic (the optional Annex F) and all real-world implementations do place strong requirements on accuracy of results. – R.. GitHub STOP HELPING ICE May 23 '13 at 21:16
  • For example, IEEE requires all results of elementary arithmetic operations to be correctly-rounded. In particular, if the mathematical result is exactly representable in the destination floating-point type, then regardless of rounding mode, the result must be that exact value. I'm not aware of any floating point implementation which lacks this guarantee, which is all you need to prove that `y-(int)y` is exact. – R.. GitHub STOP HELPING ICE May 23 '13 at 21:18
  • By the way, the reason you need one additional decimal digit each time you divide by 2 is that 10 has exactly one factor of 2. If you were working with base 16, you would only need an additional place once every 4 times you divided by 2. If you were working in base 20, you would need an additional place every second time you divided by two. And if you were working with base 3, you would need infinitely many places to represent 1/2. – R.. GitHub STOP HELPING ICE May 23 '13 at 21:22