How to round float to the upper/lower representation

Question

When I compute :

float res = 1.123123123123;

I suppose that res variable would be approximated to the nearest possible float representation of 1.123123123123.

Is it possible to approximate to the lower/upper possible float representation ?

C++ doesn't dictate floating point formats; it depends on the compiler, and also what architecture you're running on. However, most everything uses IEEE 754, and you could figure out the answer for that format. — Mooing Duck, Mar 04 '15 at 20:25
Are you asking us to calculate the floating representation of that specific number, and what value that equates to, or did you want code that finds bounds somehow, or what are you asking us? If you want an estimate, how tight does the estimate have to be? — Mooing Duck, Mar 04 '15 at 20:27
I seem to remember seeing in at least one compilers version of one of the include files macros or or inline functions that would return the next or previous sequential FP number. I highly suspect they're not very standard, though... even if IEEE-754 is... — twalberg, Mar 04 '15 at 20:51
@twalberg: http://en.cppreference.com/w/cpp/numeric/math/nextafter — Mooing Duck, Mar 04 '15 at 21:04
@MooingDuck Yeah... that's them. Requires C99 or C++11 or later... Might be present in some form in certain compilers before that, though, which would be the "not very standard" part... — twalberg, Mar 04 '15 at 21:06

Pascal Cuoq · Accepted Answer · 2015-03-05T10:39:28.340

You are lucky that you want it as a float. With most compilation platforms mapping float to IEEE 754 binary32 and double to IEEE 754 binary64, you can obtain the correct answer in an overwhelming majority of cases with, in C syntax:

double d = 1.123123123123;
#pragma STDC FENV_ACCESS ON
int save_round = fegetround();
fesetround(FE_DOWNWARD); // should be checked ideally
float f = d;
fesetround(save_round);

If you wanted the same thing for a double, you might have used a long double for the intermediate value, as long as long double is wider than double on your platform, and remembering to write the constant as long double ld = 1.123123123123L;

There are very rare cases (most requiring the human crafting the decimal representation to be in obvious bad faith) for which the method above does not work. The reason it does not work in these cases is double-rounding. The snippet below on the other hand works in all cases if your compilation platform goes very far in offering IEEE 754 formats and operations (conversion from decimal to binary is supposed to respect the rounding mode, according to the latter's principles):

#pragma STDC FENV_ACCESS ON
int save_round = fegetround();
fesetround(FE_DOWNWARD);
float f = strtof("1.123123123123");
fesetround(save_round);

In pure theory, you may even not need to invoke strtof, but it is not entirely clear to me that the compiler should convert floating-point constants according to the dynamic rounding mode even with #pragma STDC FENV_ACCESS ON.

How to round float to the upper/lower representation

1 Answers1