Finding out whether conversion from decimal to binary floating point is exact, rounding up, or rounding down

Question

I know that values that can be exactly represented in decimal floating point are often unable to be represented as exact values in binary floating point. It is easy to demonstrate in e.g. python

a = float("0.5")
print('%.17e' % (a))
5.00000000000000000e-01

a = float("0.054")
print('%.17e' % (a))
5.39999999999999994e-02

a = float("0.055")
print('%.17e' % (a))
5.50000000000000003e-02

That's entirely to be expected and perfectly proper. But what I want to know is whether there is an easy way of finding out whether the converted value is higher or lower than the exact value (or indeed if it converts exactly). From the examples above I could obviously do something with the output string but that seems extraordinarily clumsy, and it is such a useful thing to know (for example if you are incrementing a floating point number in a loop you can use it to decide whether you are going to get the expected number of iterations or not) that I am hoping there is a more straightforward way of doing this. I'm only using python as an example here - I'd prefer a language agnostic solution.

Re “if you are incrementing a floating point number in a loop you can use it to decide whether you are going to get the expected number of iterations or not”: No, you cannot. For example, consider incrementing through 0 ≤ x < .036 by .001 using IEEE-754 binary32. .001 converts to 0.001000000047497451305389404296875 and .036 converts to 0.03599999845027923583984375, and 36 times that .001+ is greater than that .036−, so you would expect at most 36 iterations. But there are 37, as many of the intermediate roundings happen to round down, so the 36th sum is 0.0359999947249889373779296875. — Eric Postpischil, Nov 27 '20 at 11:47
But if I round down the increment and round up the end target will that still be a problem? I can see that if I round them both down then there may be circumstances when that doesn't work. I can also see that there will be edge cases when the increment is very small and the threshold is very big but if I end up in that territory then all bets are off! — Bill Sellers, Nov 27 '20 at 13:11
Re “But if I round down the increment and round up the end target will that still be a problem?”: Yes. In binary32, incrementing from 0 to .27825 by .00001 has 27,824 iterations. In general, you cannot predict how the intermediate rounding errors will accumulate. Even if the increment is too high (or too small), the rounding errors may make the sum too small (or too high). The common way to increment through floating-point values is to increment with integer values and scale the counter to the desired range, so there is no accumulation of rounding error. — Eric Postpischil, Nov 27 '20 at 13:18
E.g., in C, something like `for (int i = 0; i < N; ++i) { double x = (end-start) * i / N + start; … }`. — Eric Postpischil, Nov 27 '20 at 13:20

chux - Reinstate Monica · Accepted Answer · 2020-11-27T11:16:12.723

With some languages, under select conditions, code can control the rounding mode with the conversion as to nearest, up or down (or toward 0, or ...).

Then one can compare the conversion results of the default (typically to nearest) to up or down.

#include <fenv.h>
#include <assert.h>
#include <stdio.h>

double string_to_double(int round_dir, const char *s) {
  #pragma STDC FENV_ACCESS ON
  int save_round = fegetround();
  int setround_ok = fesetround(round_dir);
  assert(setround_ok == 0);
  char *endptr;
  double d = strtod(s, &endptr);
  fesetround(save_round);
  return d;
}

// Return 1: up, -1: down, 0: exact
int updn(const char *s) {
  double up = string_to_double(FE_UPWARD, s);
  double nr = string_to_double(FE_TONEAREST, s);
  double dn = string_to_double(FE_DOWNWARD, s);
  // Others modes: FE_TOWARDZERO

  printf("%.17e, %.17e, %.17e, ", dn, nr, up);
  if (up == dn) {
    assert(up == nr);
    return 0;
  }
  if (up > nr) return -1;
  if (dn < nr) return 1;
  return 0;  // Unexpected, unless NaN
}

int main() {
  printf("%2d\n", updn("0.5"));
  printf("%2d\n", updn("0.054"));
  printf("%2d\n", updn("0.055"));
}

Output: 1: up, -1: down, 0: exact

5.00000000000000000e-01, 5.00000000000000000e-01, 5.00000000000000000e-01,  0
5.39999999999999994e-02, 5.39999999999999994e-02, 5.40000000000000063e-02, -1
5.49999999999999933e-02, 5.50000000000000003e-02, 5.50000000000000003e-02,  1

That's really helpful. In fact being able to specify up or down rather than nearest would sort out my example use case. — Bill Sellers, Nov 27 '20 at 11:29
@BillSellers: Your example use case does not work. See my comment on the question with a counterexample. — Eric Postpischil, Nov 27 '20 at 11:51
@BillSellers Concerning "to decide whether you are going to get the expected number of iterations or not": and `0 ≤ x < .036 by .001`, perhaps better to iterate with an integer: `long n = lround(0.036/0.001); for (i=0; i — chux - Reinstate Monica, Nov 27 '20 at 12:44
It is still nice having directionality on the rounding because even with an integer counting up, I know in advance that I can get close to my limit but not exceed it which is the behaviour I want. Equally there might be other cases where I want to guarantee to exceed the limit. All in all it is very useful to have the option. — Bill Sellers, Nov 27 '20 at 13:01

Finding out whether conversion from decimal to binary floating point is exact, rounding up, or rounding down

1 Answers1