Floating point limits code not producing correct results

Question

I am racking my brain trying to figure out why this code does not get the right result. I am looking for the hexadecimal representations of the floating point positive and negative overflow/underflow levels. The code is based off this site and a Wikipedia entry:

7f7f ffff ≈ 3.4028234 × 10³⁸ (max single precision) -- from wikipedia entry, corresponds to positive overflow

Here's the code:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>

using namespace std;

int main(void) {

    float two = 2;
    float twentyThree = 23;
    float one27 = 127;
    float one49 = 149;


    float posOverflow, negOverflow, posUnderflow, negUnderflow;

    posOverflow = two - (pow(two, -twentyThree) * pow(two, one27));
    negOverflow = -(two - (pow(two, one27) * pow(two, one27)));


    negUnderflow = -pow(two, -one49);
    posUnderflow = pow(two, -one49);


    cout << "Positive overflow occurs when value greater than: " << hex << *(int*)&posOverflow << endl;


    cout << "Neg overflow occurs when value less than: " << hex << *(int*)&negOverflow << endl;


    cout << "Positive underflow occurs when value greater than: " << hex << *(int*)&posUnderflow << endl;


    cout << "Neg overflow occurs when value greater than: " << hex << *(int*)&negUnderflow << endl;

}

The output is:

Positive overflow occurs when value greater than: f3800000 Neg overflow occurs when value less than: 7f800000 Positive underflow occurs when value greater than: 1 Neg overflow occurs when value greater than: 80000001

To get the hexadecimal representation of the floating point, I am using a method described here:

Why isn't the code working? I know it'll work if positive overflow = 7f7f ffff.

@LokiAstari You can't simply substitute `double` into this code, all the expressions in question would need to change, since they overflow and underflow at different values. Also, since the purpose of this code appears to be a better understanding of IEEE floating-point, I would argue that `float` is exactly what he should use. — John Calsbeek, Feb 18 '13 at 03:58

score 3 · Accepted Answer · answered Feb 18 '13 at 03:23

Your expression for the highest representable positive float is wrong. The page you linked uses (2-pow(2, -23)) * pow(2, 127), and you have 2 - (pow(2, -23) * pow(2, 127)). Similarly for the smallest representable negative float.

Your underflow expressions look correct, however, and so do the hexadecimal outputs for them.

Note that posOverflow and negOverflow are simply +FLT_MAX and -FLT_MAX. But note that your posUnderflow and negUnderflow are actually smaller than FLT_MIN(because they are denormal, and FLT_MIN is the smallest positive normal float).

Potatoswatter · Answer 2 · 2013-02-18T03:42:43.207

Floating point loses precision as the number gets bigger. A number of the magnitude 2¹²⁷ does not change when you add 2 to it.

Other than that, I'm not really following your code. Using words to spell out numbers makes it hard for me to read.

Here is the standard way to get the floating-point limits of your machine:

#include <limits>
#include <iostream>
#include <iomanip>

std::ostream &show_float( std::ostream &s, float f ) {
    s << f << " = ";
    std::ostream s_hex( s.rdbuf() );
    s_hex << std::hex << std::setfill( '0' );
    for ( char const *c = reinterpret_cast< char const * >( & f );
          c != reinterpret_cast< char const * >( & f + 1 );
          ++ c ) {
        s_hex << std::setw( 2 ) << ( static_cast< unsigned int >( * c ) & 0xff );
    }
    return s;
}

int main() {
    std::cout << std::hex;
    std::cout << "Positive overflow occurs when value greater than: ";
    show_float( std::cout, std::numeric_limits< float >::max() ) << '\n';
    std::cout << "Neg overflow occurs when value less than: ";
    show_float( std::cout, - std::numeric_limits< float >::max() ) << '\n';
    std::cout << "Positive underflow occurs when value less than: ";
    show_float( std::cout, std::numeric_limits< float >::denormal_min() ) << '\n';
    std::cout << "Neg underflow occurs when value greater than: ";
    show_float( std::cout, - std::numeric_limits< float >::min() ) << '\n';
}

output:

Positive overflow occurs when value greater than: 3.40282e+38 = ffff7f7f
Neg overflow occurs when value less than: -3.40282e+38 = ffff7fff
Positive underflow occurs when value less than: 1.17549e-38 = 00008000
Neg underflow occurs when value greater than: -1.17549e-38 = 00008080

The output depends on the endianness of the machine. Here the bytes are reversed due to little-endian order.

Note, "underflow" in this case isn't a catastrophic zero result, but just denormalization which gradually reduces precision. (It may be catastrophic to performance, though.) You might also check numeric_limits< float >::denorm_min() which produces 1.4013e-45 = 01000000.

`numeric_limits::min()` is the smallest representable *normalized* float, not the smallest representable float. So you are actually finding different values than the linked article in those cases. — John Calsbeek, Feb 18 '13 at 03:34
@JohnCalsbeek Yeah, was editing as you posted that :) fixed now. — Potatoswatter, Feb 18 '13 at 03:36

aib · Answer 3 · 2013-02-18T03:08:49.567

1

Your code assumes integers have the same size as a float (so do all but a few of the posts on the page you've linked, btw.) You probably want something along the lines of:

for (size_t s = 0; s < sizeof(myVar); ++s) {
    unsigned char *byte = reinterpret_cast<unsigned char*>(myVar)[s];
    //sth byte is byte
}

that is, something akin to the templated code on that page.

Your compiler may not be using those specific IEEE 754 types. You'll need to check its documentation.

Also, consider using std::numeric_limits<float>.min()/max() or cfloat FLT_ constants for determining some of those values.

edited Feb 18 '13 at 03:08

answered Feb 18 '13 at 03:03

aib

45,516
10
73
79

1

`*(int*)&` assumes that `float` is the same size as an `int` (and that the compiler will let strict aliasing violations slide). Your code assumes the machine's endianness (or it would once you tried to use the bytes). All the "proper" ways to do this are architecture- or compiler-dependent. – John Calsbeek Feb 18 '13 at 03:27

Floating point limits code not producing correct results

3 Answers3