When serializing, why multiply mantissa by INT_MAX before casting to unsigned int?

Question

I want to understand float serialization better. Why in this example do they multiple the mantissa by INT_MAX before casting to unsigned int?

void WriteFloat (float number)
{
  int exponent;
  unsigned long mantissa;

  mantissa = (unsigned int) (INT_MAX * frexp(number, &exponent);

  WriteInt (exponent);
  WriteUnsigned (mantissa);
}

float ReadFloat ()
{
  int exponent = ReadInt();
  unsigned long mantissa = ReadUnsigned();

  float value = (float)mantissa / INT_MAX;

  return ldexp (value, exponent);
}

FYI, this is a bad way to serialize floating-point values. As long as `float` is a binary floating-point format, either `ldexpf(mantissa, FLT_MANT_DIG)` or `scalbnf(mantissa, FLT_MANT_DIG)` will give the significand as an integer. (“Significand” is the preferred term; “mantissa” is an old word for the fraction part of a logarithm.) — Eric Postpischil, Aug 28 '21 at 16:29
Dylan Landry, [Example](https://stackoverflow.com/a/5608466/7933478) uses `unsigned long mantissa;` instead of `unsigned mantissa;` for no apprent reasosn. All-in-all, that answer has mutiple problems (not your fault). IMO, not a good answer. — chux - Reinstate Monica, Aug 28 '21 at 16:35
Dylan Landry, serializing floating point values as integers also incur problems with [NAN](https://en.wikipedia.org/wiki/NaN), +/- infinity and -0.0. These are not addressed in the example code. Good serializing requires more than what you are reviewing. IMO, the simplest, good serialization is `printf("%a", fp);` — chux - Reinstate Monica, Aug 28 '21 at 17:33
@chux-ReinstateMonica I did some research on the `%a` approach and went with that, it was easier for me to understand. — Dylan Landry, Aug 29 '21 at 14:01

Adrian Mole · Accepted Answer · 2021-08-28T16:40:00.750

2

The frexp() function returns a 'normalized' value in the range (±)[0.5 – 1.0). Clearly, this is not a range that can be properly represented in a variable of integral type (a simple cast of that value would always yield zero, as the range does not include ±1.0), so it has to be 'denormalized' (stretched) into a range that is fully representable.

Multiplying by INT_MAX will give (nearly) the greatest precision possible (assuming int and unsigned long have the same bit-width)^†, without overflowing the range of the destination type (including the possibility of storing the representation of a negative value in that unsigned integer).

Note: One could get more precision by storing the sign of the normalized fraction, then subtracting 0.5 from its absolute value, re-applying the sign and multiplying by 2.0 * INT_MAX (I think this will be safe) … but the precision gain (1 bit) is likely not worth the extra effort in coding (and decoding) the stored value.

^† On many platforms, the int and long types are the same size; however, this is not required so, as mentioned in the comments, using LONG_MAX as the multiplier/divisor would potentially offer greater precision; however, that may be overkill, depending on how many bits of mantissa there are in the source. If it's an IEEE-754 single-precision float, it will have 23 bits, so a 16-bit int type would lose out, but a 64-bit LONG_MAX would be over-cooking.

edited Aug 28 '21 at 16:40

answered Aug 28 '21 at 15:43

Adrian Mole

49,934
160
51
83

Disagree with "Multiplying by `INT_MAX` will give the greatest precision possible,". Multiplying by `INT_MAX + 1.0` or even `LONG_MAX + 1.0` (with 16-bit `int`) will give greater precision. – chux - Reinstate Monica Aug 28 '21 at 15:57
@chux On `LONG_MAX` - see edit. On the `+ 1` - wouldn't that (potentially) cause overflow/underflow issues for negative value data? – Adrian Mole Aug 28 '21 at 15:59
.. or for data that are at the *actual* +/- 1.0 limit? – Adrian Mole Aug 28 '21 at 16:09
1

`(±)[0.5 – 1.0]` is incorrect. S/b `(±)[0.5 – 1.0)`. Thus I see no overflow issue. – chux - Reinstate Monica Aug 28 '21 at 16:31
@chux Thanks for the correction (I wasn't actually sure, so I checked the Standard and edited my answer to use the correct range.) But using `INT_MAX + 1` isn't going to add that much precision, really - especially if we have 32-bit ints. However, you are technically correct, and the 16-bit int platforms will be awkward here. – Adrian Mole Aug 28 '21 at 16:42
There doesn't seem to be a Standard C `_I32_MAX` constant. – Adrian Mole Aug 28 '21 at 16:50
1

Another benefit of `INT_MAX plus 1` is that it is a power-of-2, thus incurring no rounding error in the `INT_MAX_PLUS1 * frexp(number, &exponent)` multiplication. I'd use `#define INT_MAX_PLUS1 ((INT_MAX/2 + 1)*2.0f)` and `frexpf()` instead of `frexp()`. – chux - Reinstate Monica Aug 28 '21 at 16:51
I've read the value of `INT_MAX` depends on the C implementation. Is it correct that if the serializing and unserializing machine's value for `INT_MAX` differ, that the float values with vary drastically? Is this likely? Sorry for adding on questions. – Dylan Landry Aug 28 '21 at 17:13
@DylanLandry Yes - that *could* be a very nasty problem. A fixed multiplier (like 2^24 for 23-bit mantissa) would likely be better. But, as with the comments from chux, that's maybe more of a discussion for the original (linked) answer. – Adrian Mole Aug 28 '21 at 17:17

When serializing, why multiply mantissa by INT_MAX before casting to unsigned int?

1 Answers1