1

When converting a number from half to single floating representation I see a change in the numeric value.

Here I have 65500 stored as a half precision float, but upgrading to single precision changes the underlying value to 65504, which is many floating point increments away from the target.

In this specific case, why does this happen?

(Pdb) np.asarray(65500,dtype=np.float16).astype(np.float32)
array(65504., dtype=float32)

As a side note, I also observe

(Pdb) int(np.finfo(np.float16).max)
65504
smci
  • 32,567
  • 20
  • 113
  • 146
Mikhail
  • 7,749
  • 11
  • 62
  • 136
  • 4
    The precision of half-precision float can't distinguish `65500` and `65504`. – Barmar Jul 08 '21 at 00:12
  • 65504 is only 0.006% different from 65500. Your title *"substantially modify"* is really badly misleading and alarming, it suggests np is broken or something. – smci Jul 08 '21 at 03:12
  • (By the way, you could store this exactly as a 16-bit (unsigned) integer. If you don't need float, don't use float.) – smci Jul 08 '21 at 03:14
  • 2
    Your precision loss happened when you converted to float16, not when you converted to float32. The conversion to float32 just changes the printing handling so you see the precision loss. – user2357112 Jul 08 '21 at 03:19
  • @user2357112supportsMonica I don't think this is true because, you can't even represent 65504 in float16... For example, int(np.finfo(np.float16).max) exceeds the max value (excluding inf, etc) – Mikhail Jul 08 '21 at 06:22
  • @Mikhail: "you can't even represent 65504 in float16" - Yes you can, and you did. The display logic just doesn't display the exact value. – user2357112 Jul 08 '21 at 06:25
  • @user2357112supportsMonica It exceeds np.finfo(np.float16).max ... – Mikhail Jul 08 '21 at 06:27
  • 1
    You know how the float64 value displayed as `0.3` actually has an exact value of 0.299999999999999988897769753748434595763683319091796875? Same thing here. You're interpreting the displayed value as the exact value. – user2357112 Jul 08 '21 at 06:28
  • @Mikhail: `np.finfo(np.float16).max` _is_ 65504.0. Try executing `np.finfo(np.float16).max == 65504` and `np.finfo(np.float16).max == 65500` at a prompt. They should return `True` and `False`, respectively. – Mark Dickinson Jul 08 '21 at 06:49
  • @MarkDickinson Yeah I think you're right. Part of the confusion is that `(Pdb) np.finfo(np.float16).max` gives `65500.0`, although https://evanw.github.io/float-toy/ confirms the maximum possible number. I guess the underlying issue is the printing behavior in numpy... – Mikhail Jul 08 '21 at 06:52

1 Answers1

4

The error is not "many floating point increments away" [corrected to match OP's improved wording]. Read the standard IEEE 754-2008. It specifies 10 bits for the mantissa, or 1024 distinct values. Your value is on the close order of 2^16, so you have an increment of 2^6, or 64.

The format also gives 1 bit for the sign and 5 for the characteristic (exponent).

65500 is stored as something equivalent to + 2^6 * 1023.5. This translates directly to 65504 when you convert to float32. You lost the precision when you converted your larger number to 10 bits of precision. When you convert in either direction, the result is always constrained by the less-precise type.

metatoaster
  • 17,419
  • 5
  • 55
  • 66
Prune
  • 76,765
  • 14
  • 60
  • 81
  • To clarify, "larger than the smallest increment" in the destination range (32 bit). I don't believe its possible to represent values above max value in the source (16 bit) range. – Mikhail Jul 08 '21 at 00:13
  • 2
    @Mikhail: "16 bit float" doesn't mean "all 16 bits get used for mantissa". – smci Jul 08 '21 at 03:17
  • 1
    I think the spacing between floats at the upper end of the float16 range is 32 rather than 64. – Mark Dickinson Jul 08 '21 at 06:58