1

I want to understand float serialization better. Why in this example do they multiple the mantissa by INT_MAX before casting to unsigned int?

void WriteFloat (float number)
{
  int exponent;
  unsigned long mantissa;

  mantissa = (unsigned int) (INT_MAX * frexp(number, &exponent);

  WriteInt (exponent);
  WriteUnsigned (mantissa);
}

float ReadFloat ()
{
  int exponent = ReadInt();
  unsigned long mantissa = ReadUnsigned();

  float value = (float)mantissa / INT_MAX;

  return ldexp (value, exponent);
}
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Dylan Landry
  • 1,150
  • 11
  • 27
  • FYI, this is a bad way to serialize floating-point values. As long as `float` is a binary floating-point format, either `ldexpf(mantissa, FLT_MANT_DIG)` or `scalbnf(mantissa, FLT_MANT_DIG)` will give the significand as an integer. (“Significand” is the preferred term; “mantissa” is an old word for the fraction part of a logarithm.) – Eric Postpischil Aug 28 '21 at 16:29
  • Dylan Landry, [Example](https://stackoverflow.com/a/5608466/7933478) uses `unsigned long mantissa;` instead of `unsigned mantissa;` for no apprent reasosn. All-in-all, that answer has mutiple problems (not your fault). IMO, not a good answer. – chux - Reinstate Monica Aug 28 '21 at 16:35
  • 1
    Dylan Landry, serializing floating point values as integers also incur problems with [NAN](https://en.wikipedia.org/wiki/NaN), +/- infinity and -0.0. These are not addressed in the example code. Good serializing requires more than what you are reviewing. IMO, the simplest, good serialization is `printf("%a", fp);` – chux - Reinstate Monica Aug 28 '21 at 17:33
  • 1
    @chux-ReinstateMonica I did some research on the `%a` approach and went with that, it was easier for me to understand. – Dylan Landry Aug 29 '21 at 14:01

1 Answers1

2

The frexp() function returns a 'normalized' value in the range (±)[0.5 – 1.0). Clearly, this is not a range that can be properly represented in a variable of integral type (a simple cast of that value would always yield zero, as the range does not include ±1.0), so it has to be 'denormalized' (stretched) into a range that is fully representable.

Multiplying by INT_MAX will give (nearly) the greatest precision possible (assuming int and unsigned long have the same bit-width), without overflowing the range of the destination type (including the possibility of storing the representation of a negative value in that unsigned integer).


Note: One could get more precision by storing the sign of the normalized fraction, then subtracting 0.5 from its absolute value, re-applying the sign and multiplying by 2.0 * INT_MAX (I think this will be safe) … but the precision gain (1 bit) is likely not worth the extra effort in coding (and decoding) the stored value.


On many platforms, the int and long types are the same size; however, this is not required so, as mentioned in the comments, using LONG_MAX as the multiplier/divisor would potentially offer greater precision; however, that may be overkill, depending on how many bits of mantissa there are in the source. If it's an IEEE-754 single-precision float, it will have 23 bits, so a 16-bit int type would lose out, but a 64-bit LONG_MAX would be over-cooking.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Disagree with "Multiplying by `INT_MAX` will give the greatest precision possible,". Multiplying by `INT_MAX + 1.0` or even `LONG_MAX + 1.0` (with 16-bit `int`) will give greater precision. – chux - Reinstate Monica Aug 28 '21 at 15:57
  • @chux On `LONG_MAX` - see edit. On the `+ 1` - wouldn't that (potentially) cause overflow/underflow issues for negative value data? – Adrian Mole Aug 28 '21 at 15:59
  • .. or for data that are at the *actual* +/- 1.0 limit? – Adrian Mole Aug 28 '21 at 16:09
  • 1
    `(±)[0.5 – 1.0]` is incorrect. S/b `(±)[0.5 – 1.0)`. Thus I see no overflow issue. – chux - Reinstate Monica Aug 28 '21 at 16:31
  • @chux Thanks for the correction (I wasn't actually sure, so I checked the Standard and edited my answer to use the correct range.) But using `INT_MAX + 1` isn't going to add that much precision, really - especially if we have 32-bit ints. However, you are technically correct, and the 16-bit int platforms will be awkward here. – Adrian Mole Aug 28 '21 at 16:42
  • There doesn't seem to be a Standard C `_I32_MAX` constant. – Adrian Mole Aug 28 '21 at 16:50
  • 1
    Another benefit of `INT_MAX plus 1` is that it is a power-of-2, thus incurring no rounding error in the `INT_MAX_PLUS1 * frexp(number, &exponent)` multiplication. I'd use `#define INT_MAX_PLUS1 ((INT_MAX/2 + 1)*2.0f)` and `frexpf()` instead of `frexp()`. – chux - Reinstate Monica Aug 28 '21 at 16:51
  • I've read the value of `INT_MAX` depends on the C implementation. Is it correct that if the serializing and unserializing machine's value for `INT_MAX` differ, that the float values with vary drastically? Is this likely? Sorry for adding on questions. – Dylan Landry Aug 28 '21 at 17:13
  • @DylanLandry Yes - that *could* be a very nasty problem. A fixed multiplier (like 2^24 for 23-bit mantissa) would likely be better. But, as with the comments from chux, that's maybe more of a discussion for the original (linked) answer. – Adrian Mole Aug 28 '21 at 17:17