Half-precision floating-point

Question

I have a small question about Half-precision IEEE-754.

1) I have the following exercise: 13,7625 shall be written in 16 bit (half precision)

so I started to convert the number from DEC to Binary and I got this 13,7625 = 1101.1100001100₂

all in all, it would be 1.1011100001100 * 2³.

sign bit is 0 because the number is positive.
Mantissa shall have ten bits = 101 110 0001
Exponent have five bits = bias(15) + 3 = 18 thus the exponent is 10010 and here is the damn problem.

My professor gave us the solution and as I know I did the mantissa quite right and the binary conversion as well for but for the Exponent, he states that it's 19=10011 but I don't get it. can the bais be 16? according to Wikipedia its - 15 for the half-precision. - 127 for the single-precision. - 1032 for the double-precision.

can you pls point out what did I do wrong pls?.

2)one other question what would be the exponent bias if we have the following situation: 1 sign bit + 4 Mantissa bits + 3 exponent bits. and why?

thanks.

Alain Merigot · Accepted Answer · 2019-06-29T01:15:14.530

1) I have the following exercise: 13,7625 shall be written in 16 bit (half precision)

so I started to convert the number from DEC to Binary and I got this 13,7625 = 1101.11000011002

You mantisssa conversion is correct and so is your exponent. Exponent bias for half precision is 15 https://en.wikipedia.org/wiki/Half-precision_floating-point_format

one other question what would be the exponent bias if we have the following situation: 1 sign bit + 4 Mantissa bits + 3 exponent bits. and why?

The rules for IEEE-754 FP coding is that, if exponent is coded with n bits, bias is 2^n-1-1. This is applied for simple precision (8b/bias 2⁷-1=127), double (11b/ 2¹⁰-1=1023 bias (and not 1032, there is a small typo in question)), etc.
For an exponent field of 3 bits, this gives a bias of 2²-1=3

For your coding problem, this would give an exponent code of 3+3=6=110. For the mantissa, it depends on the rounding policy. if mantissa is rounded towards 0, we can code 1.1011(100001100) by just dropping the trailing bits and the final code would be
0.110.1011.

But the rounding error is slightly superior to 0.5 ULP (precisely 0.1000011 ULP) and to minimize it, 1.10111000011 should be rounded on 4 bits by adding 1 to the ULP.

  1.1011 
+      1
= 1.1100

and the final code would be 0.110.1100

thanks for answering but the second question has nothing to do with the first one. I don't have to round anything cause the professor didn't ask us to put/round down the answer into 8 bits. can it be that the professor made a mistake and told us that the real answer is for the exponent is 19=10011? thanks for the explanation. it's written in the lecture pdf online so I guess she made a mistake. about the 19 thing!!. — StudentAccount4, Jun 29 '19 at 17:05
can you pls explain to me the outcome of this exercise then: the professor gave us this exercise ``4,625=100.101=1.00101*2²`` and then she wrote the ``exponent=110 `` pls explain to me what is this professor doing? i started to think that she just writing the number in excess without adding or subtracting(when needed) the bias. — StudentAccount4, Jun 29 '19 at 21:29
I also think you are right. In minifloat 8 bits, exponent should 3+2=5 and not 6. But probably it is just a careless mistake. Ask her directly. — Alain Merigot, Jun 30 '19 at 07:28

Half-precision floating-point

1 Answers1