1

I have an input floating point value that is 0.0f <= value < 1.0f (note less than one).

When multiplying this value up to a larger range, naturally the floating point precision is decreased meaning the value can end up outside of the equivalent range.

For example if I start off with a value such as:

0.99999983534521f

Then multiply it by 100, I get:

100.000000000000f

Which is fine, but how do I then reduce the floating point representation to be the nearest floating point value that is still less than 100?

I found this little manual trick:

union test
{
    int integer;
    float floating;
};

test value;

value.floating = 1.0f;

printf("%x\n", value.integer);

Then I take that hex value and reduce it by one hex digit, then set it explicitly like so:

unsigned int almost_one = 0x3f7fffff;

float value = 1.0f;

if (value >= 1.0f)      std::memcpy(&value, &almost_one, sizeof(float));

That works well for this specific value, but is there a more general approach I can use instead?

I'm hoping there's a magic instruction I'm not aware of that I can use to achieve this!

Edit: Great set of answers here, std::nextafter looks like what I'm after. Unfortunately I can't yet use C++11 math libraries so this won't work for me. To save complicating things, I'll tag this question with C++11 and accept Mike's answer below.

I've started a new question for C++03 : Alternative to C++11's std::nextafter and std::nexttoward for C++03?

Community
  • 1
  • 1
Dan
  • 33,953
  • 24
  • 61
  • 87
  • 1
    That looks like undefined behaviour... – autistic May 01 '13 at 16:44
  • 1
    What are you trying to achive? If your result is exactly 100.0f then that is the closest number to the actual result. decreasing the mantissa by one only makes your result less acurate. – typ1232 May 01 '13 at 16:46
  • Another question of vital importance... How are you printing these values? – autistic May 01 '13 at 16:50
  • 1
    You get 100.0 from `0.99999983534521f` because it is actually 1.0, since there are enough 9's and the 8 in there to flip it over - it is not less accurate, it's just what the limit of the number is (if you print the original number, you'd get 1.0, since that is it's value). Bear in mind that a float is only 24 bits of mantissa, which gives around 7 significant digits in decimal form. Multiplying a number does not reduce its precision. Floating point will lose precision when adding or subtracting larger numbers, because the number has to be normalised [decimal point at the same place]. – Mats Petersson May 01 '13 at 16:51
  • @MatsPetersson Good remark, but multiplication is approximate too, although it is never as problematic as addition/subtraction. When the initial range is 0.0 .. 1.0, however, a value `x` may **never** cross over from `x < 1.0` to `x * M > M`. You should expand your comment into an answer. – Pascal Cuoq May 01 '13 at 16:57
  • I'm not sure what you're trying to do with the "manual" trick, but as written, it's very much undefined behavior. Did you perhaps invert the use of the fields: `value.floating = 1.0; printf( "%x\n", value.integer );` will probably do what you want. (Still undefined behavior, but most implementations will do the right thing.) – James Kanze May 01 '13 at 16:58
  • 2
    @MatsPetersson: No, the IEEE-754 32-bit binary floating-point value nearest 0.99999983534521 is 0.999999821186065673828125, and there are two more after that before you get to 1. – Eric Postpischil May 01 '13 at 17:01
  • @PascalCuoq That was my initial reaction too, but I think what he meant was that the results of a multiplication will have as many significant bits as the operands. (Multiplying two 24 bit numbers precisely will result in up to 47 bits. But the top 24 will be correct.) – James Kanze May 01 '13 at 17:01
  • @JamesKanze: 2**24-1 has 24 bits, but (2**24-1)*(2**24-1) is 281474943156225, which has 48 bits. – Eric Postpischil May 01 '13 at 17:11
  • @EricPostpischil & PascalCuoq : Now I'm confused. One of you is saying that the number can be represented as an float value, and the other is saying that "what is described in the original post can't happen if the value is less than 1.0". Now, I'm not sure who is wrong, but surely the number should be less than 100 if the number can be represented as a float. 100.0 shouldn't lead to any rounding problems in itself, as (small) integer values can always be represented in floating point "as is". I shouldn't have made that comment, clearly. – Mats Petersson May 01 '13 at 17:15
  • 1
    @MatsPetersson: I do not see that Pascal Cuoq says that the problem described in the question cannot happen. He writes that `x*M > M` cannot be true. However, the problem is that `x*M == M` can be true. – Eric Postpischil May 01 '13 at 17:21
  • Ok, so I take it then, that where it goes "wrong" is that the last significant bit (outside of the high 24 bits) becomes one, so the whole number is rounded up by one bit, leading to a result of 100.000000, instead of the expected 99.99998... And I must stop commenting on floating point numbers (and volatile questions) as I keep giving an answer that is almost, but not quite right. – Mats Petersson May 01 '13 at 17:29
  • Thanks for all your replies. I've just edited the question. I'm after a solution for C++03, which I should have mentioned in the original question, but I've left this one for C++11 and created a new question for C++03. – Dan May 02 '13 at 10:46

2 Answers2

6

I'm hoping there's a magic instruction I'm not aware of that I can use to achieve this!

If you've got a C++11 (or C99) standard library, then std::nextafter(value, 0.0f) from <cmath> (or nextafter from <math.h>) will give you the largest representable value smaller than value.

It gives the "next" distinct value after the first argument, in the direction of the second; so here, the next distinct value closer to zero.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • 1
    It gives you the largest representable value smaller than `value` if `value` is positive. That is satisfactory in the case described in the problem. However, if you want the next largest represent value in all cases, it should be `std::nextafter(value, -std::numeric_limits().infinity())`. – Eric Postpischil May 01 '13 at 17:04
  • @EricPostpischil: Indeed, by "smaller" I meant "smaller magnitude", and added the second paragraph to clarify that I meant "towards zero". I'd be a bit nervous messing around with infinities unless I needed to, so I didn't mention it in relation to this question where "towards zero" is good enough. – Mike Seymour May 01 '13 at 17:08
0

Sorry for the confusion, I've missed the point at the first time. What you are looking for is of course unit in the last place (ULP), which is closely related to machine epsilon. Here is the demo:

#include <iostream>
#include <cmath>
#include <cassert>

float compute_eps() {
  float eps = 1.0f;

  // Explicit cast to `float` is needed to
  // avoid possible extra precision pitfalls.
  while (float(1.0f + eps) != 1.0f)
    eps /= 2.0f;

  return eps;
}

float ulp(float x) {
  int exp;

  frexp(x, &exp);

  static float eps = compute_eps();

  return eps * (1 << exp);
}

main() {
  float x = 100.0f;
  float y = x - ulp(x);
  float z = nextafterf(x, 0.0f);

  assert(y == z);

  std::cout.precision(20);
  std::cout << y << std::endl;
  std::cout << z << std::endl;
}

Please, understand that this answer was intended more as educational rather than practical. I wanted to show which quantities (from theory) have to be involved in order to determine neighbouring floating-point numbers. Of course one could use std::numeric_limits<T>::epsilon() to determine machine epsilon, for example. Or go ahead and use the bullet-proof nextafterf (which is likely to be implemented much more efficiently than my demo) to directly get the neighbouring floating-point number. All in all, don't judge this answer too seriously.

NOTE: Special cases of exponent (like NaN, infinity, subnormal, and etc.) are not handled in this demo. But it's pretty straightforward how to extend this demo to support them.

Alexander Shukaev
  • 16,674
  • 8
  • 70
  • 85
  • The quantity that you subtract from `value` to obtain the next lower representable number is called ULP, not epsilon. – Pascal Cuoq May 01 '13 at 17:05
  • @Pascal Cuoq: Look [here](https://en.wikipedia.org/wiki/Unit_in_the_last_place) on how ULP is related to machine epsilon. – Alexander Shukaev May 01 '13 at 17:10
  • I know how ULP is related to epsilon, I am just saying that the OP wants to obtain the nearest lower floating-point number to a value near 100, and subtracting a machine epsilon is not going to get him there. – Pascal Cuoq May 01 '13 at 17:11
  • The expression `1.0f + eps != 1.0f` may fail; the C++ standard permits an implementation to use extra precision, and some implementations do. – Eric Postpischil May 01 '13 at 18:46
  • @Eric: Do you have a link at hand? – Alexander Shukaev May 01 '13 at 18:52
  • [C++ draft n3092](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf) clause 5, paragraph 11: “The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.” – Eric Postpischil May 01 '13 at 19:04
  • @Eric: Sure, they can and usually are when calculations are performed. But there is no mention that during the comparison operations like `!=` or `==` they are treated with this additional precision. They will be definitely truncated to the expected precision before the comparison. As a result, I think this is wrong assumption, as that would be completely unexpected and awkward behavior. – Alexander Shukaev May 01 '13 at 19:11
  • @Haroogan: `==` and `!=` are operators, and they form equality expressions per clause 5.10, which is a sub-clause of clause 5, Expressions. Values are required to be converted to the precision and width of their actual type by casts and assignments but not by `==` and `!=`. E.g., 5.17, Assignment operators, states in paragraph 4 that the expression (on the right side) is converted to the type of the left operand, per clause 4 (see 4.8). The clause on equality operators does not state this. – Eric Postpischil May 01 '13 at 19:17
  • 1
    @Eric: We had a discussion on this issue [here](http://stackoverflow.com/questions/16325277/floating-point-equality-test-and-extra-precision-can-this-code-fail), if anyone is interested. – Alexander Shukaev May 01 '13 at 22:10