Why does converting from np.float16 to np.float32 modify the value?

Question

When converting a number from half to single floating representation I see a change in the numeric value.

Here I have 65500 stored as a half precision float, but upgrading to single precision changes the underlying value to 65504, which is many floating point increments away from the target.

In this specific case, why does this happen?

(Pdb) np.asarray(65500,dtype=np.float16).astype(np.float32)
array(65504., dtype=float32)

As a side note, I also observe

(Pdb) int(np.finfo(np.float16).max)
65504

The precision of half-precision float can't distinguish `65500` and `65504`. — Barmar, Jul 08 '21 at 00:12
65504 is only 0.006% different from 65500. Your title *"substantially modify"* is really badly misleading and alarming, it suggests np is broken or something. — smci, Jul 08 '21 at 03:12
(By the way, you could store this exactly as a 16-bit (unsigned) integer. If you don't need float, don't use float.) — smci, Jul 08 '21 at 03:14
Your precision loss happened when you converted to float16, not when you converted to float32. The conversion to float32 just changes the printing handling so you see the precision loss. — user2357112, Jul 08 '21 at 03:19
@user2357112supportsMonica I don't think this is true because, you can't even represent 65504 in float16... For example, int(np.finfo(np.float16).max) exceeds the max value (excluding inf, etc) — Mikhail, Jul 08 '21 at 06:22
@Mikhail: "you can't even represent 65504 in float16" - Yes you can, and you did. The display logic just doesn't display the exact value. — user2357112, Jul 08 '21 at 06:25
@user2357112supportsMonica It exceeds np.finfo(np.float16).max ... — Mikhail, Jul 08 '21 at 06:27
You know how the float64 value displayed as `0.3` actually has an exact value of 0.299999999999999988897769753748434595763683319091796875? Same thing here. You're interpreting the displayed value as the exact value. — user2357112, Jul 08 '21 at 06:28
@Mikhail: `np.finfo(np.float16).max` _is_ 65504.0. Try executing `np.finfo(np.float16).max == 65504` and `np.finfo(np.float16).max == 65500` at a prompt. They should return `True` and `False`, respectively. — Mark Dickinson, Jul 08 '21 at 06:49
@MarkDickinson Yeah I think you're right. Part of the confusion is that `(Pdb) np.finfo(np.float16).max` gives `65500.0`, although https://evanw.github.io/float-toy/ confirms the maximum possible number. I guess the underlying issue is the printing behavior in numpy... — Mikhail, Jul 08 '21 at 06:52

score 4 · Answer 1 · edited Jul 08 '21 at 00:37

4

The error is not "many floating point increments away" [corrected to match OP's improved wording]. Read the standard IEEE 754-2008. It specifies 10 bits for the mantissa, or 1024 distinct values. Your value is on the close order of 2^16, so you have an increment of 2^6, or 64.

The format also gives 1 bit for the sign and 5 for the characteristic (exponent).

65500 is stored as something equivalent to + 2^6 * 1023.5. This translates directly to 65504 when you convert to float32. You lost the precision when you converted your larger number to 10 bits of precision. When you convert in either direction, the result is always constrained by the less-precise type.

edited Jul 08 '21 at 00:37

metatoaster

17,419
5
55
66

answered Jul 08 '21 at 00:13

Prune

76,765
14
60
81

To clarify, "larger than the smallest increment" in the destination range (32 bit). I don't believe its possible to represent values above max value in the source (16 bit) range. – Mikhail Jul 08 '21 at 00:13
2

@Mikhail: "16 bit float" doesn't mean "all 16 bits get used for mantissa". – smci Jul 08 '21 at 03:17
1

I think the spacing between floats at the upper end of the float16 range is 32 rather than 64. – Mark Dickinson Jul 08 '21 at 06:58

Why does converting from np.float16 to np.float32 modify the value?

1 Answers1