I am struggling to convert 32bit floating point to 16bit floating point with C.
I understand the concept of normalizing, denormalizing, etc.
But I failed to understand the below result.
This conversion complies with IEEE 754 standard. (using round-to-even mode)
32bit floating point
00110011 01000000 00000000 00000000
converted 16bit floating point
00000000 00000001
This is the step what I've taken.
Given 32bit floating point's sign bit is 0, exp field is 102, rest is fraction bits field.
So exp field 102 has to be -127 bias, so it becomes -25, and it goes like below.
// since exp field is not zero, there will be leading 1.
1.1000000 00000000 00000000 * 2^(-25)
When converting above number to half precision floating point, we have to plus bias (15) to the exponent to encode exp field.
so exp field is -10.
Since encoded exp field is smaller than 0, given 32bit floating point cannot be expressed successfully to the half precision floating point.
So I thought half precision floating point bit pattern will go like below
00000000 00000000
But Why 00000000 00000001
?
I have read many articles that have been uploaded on stackoverflow, but they are just the code samples, not actually dealing with the internal behavior.
Can someone please contradict my misconception?