10

In C programming, I find a weird problem, which counters my intuition. When I declare a integer as the INT_MAX (2147483647, defined in the limits.h) and implicitly convert it to a float value, it works fine, i.e., the float value is same with the maximum integer. And then, I convert the float back to an integer, something interesting happens. The new integer becomes the minimum integer (-2147483648).
The source codes look as below:

int a = INT_MAX;
float b = a; // b is correct
int a_new = b; // a_new becomes INT_MIN

I am not sure what happens when the float number b is converted to the integer a_new. So, is there any reasonable solution to find the maximum value which can be switched forth and back between integer and float type?

PS: The value of INT_MAX - 100 works fine, but this is just an arbitrary workaround.

Himanshu
  • 4,327
  • 16
  • 31
  • 39
houtoms
  • 103
  • 1
  • 6
  • floating point has only limited precision so I don't think this conversion is weird. – ymonad May 02 '14 at 04:58
  • 1
    Contrary to your comment, `b` is *not* correct. If you look at it closely I think you'll find it is actually `INT_MAX+1` after the first conversion. `INT_MAX` on your platform is 2147483647, *not* 2147483648. I.e., the *first* conversion is where the resulting delta is first introduced. [**See it live**](http://ideone.com/A9duGf) – WhozCraig May 02 '14 at 05:02
  • 1
    A 4-byte float uses 23 bits to store the mantissa and 9 to store the sign and exponent. This means that storing the largest 32-bit integers cannot be done completely accurately. – Jonathan Leffler May 02 '14 at 05:14
  • It looks like the problem doesn't happen on all platforms: see [example here](http://ideone.com/cNP5te) – Aurélien Gasser May 02 '14 at 05:19
  • 2
    @AurélienGasser It's a fallacy. I'm sure that the compiler optimizes by default. Disable the optimizations and then see the results. The number in question cannot be represented exactly as a floating point, so there is no way to get it back. – devnull May 02 '14 at 05:31

1 Answers1

15

This answer assumes that float is an IEEE-754 single precision float encoded as 32-bits, and that an int is 32-bits. See this Wikipedia article for more information about IEEE-754.


Floating point numbers only have 24-bits of precision, compared with 32-bits for an int. Therefore int values from 0 to 16777215 have an exact representation as floating point numbers, but numbers greater than 16777215 do not necessarily have exact representations as floats. The following code demonstrates this fact (on systems that use IEEE-754).

for ( int a = 16777210; a < 16777224; a++ )
{
    float b = a;
    int c = b;
    printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}

The expected output is

a=16777210 c=16777210 b=0x4b7ffffa
a=16777211 c=16777211 b=0x4b7ffffb
a=16777212 c=16777212 b=0x4b7ffffc
a=16777213 c=16777213 b=0x4b7ffffd
a=16777214 c=16777214 b=0x4b7ffffe
a=16777215 c=16777215 b=0x4b7fffff
a=16777216 c=16777216 b=0x4b800000
a=16777217 c=16777216 b=0x4b800000
a=16777218 c=16777218 b=0x4b800001
a=16777219 c=16777220 b=0x4b800002
a=16777220 c=16777220 b=0x4b800002
a=16777221 c=16777220 b=0x4b800002
a=16777222 c=16777222 b=0x4b800003
a=16777223 c=16777224 b=0x4b800004

Of interest here is that the float value 0x4b800002 is used to represent the three int values 16777219, 16777220, and 16777221, and thus converting 16777219 to a float and back to an int does not preserve the exact value of the int.


The two floating point values that are closest to INT_MAX are 2147483520 and 2147483648, which can be demonstrated with this code

for ( int a = 2147483520; a < 2147483647; a++ )
{
    float b = a;
    int c = b;
    printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}

The interesting parts of the output are

a=2147483520 c=2147483520 b=0x4effffff
a=2147483521 c=2147483520 b=0x4effffff
...
a=2147483582 c=2147483520 b=0x4effffff
a=2147483583 c=2147483520 b=0x4effffff
a=2147483584 c=-2147483648 b=0x4f000000
a=2147483585 c=-2147483648 b=0x4f000000
...
a=2147483645 c=-2147483648 b=0x4f000000
a=2147483646 c=-2147483648 b=0x4f000000

Note that all 32-bit int values from 2147483584 to 2147483647 will be rounded up to a float value of 2147483648. The largest int value that will round down is 2147483583, which the same as (INT_MAX - 64) on a 32-bit system.

One might conclude therefore that numbers below (INT_MAX - 64) will safely convert from int to float and back to int. But that is only true on systems where the size of an int is 32-bits, and a float is encoded per IEEE-754.

user3386109
  • 34,287
  • 7
  • 49
  • 68