-1

Just looking for suggestions, how should the IEEE754 single precision floating point representation i.e. the following:

-1sign ∗ 2exponent-127 ∗ 1.mantissa2

where the hidden bit worth is 1.0, be accurately and efficiently converted to a custom floating point representation format like the following:

-1sign ∗ 2exponent-128 ∗ 0.1mantissa2

where the hidden bit worth is 0.5

I do not intend to delegate the work to someone to do this work for me but mostly looking for suggestions on how to do it in the right and most accurate way.

Akay
  • 1,092
  • 12
  • 32

1 Answers1

3

The bits that represent some value x in the first scheme represent x/4 in the second scheme. So, clearly, to represent x in the second scheme, one normally increases the exponent by two. Then there are just the abnormal cases to deal with:

  • If the exponent is 255, the object is infinity or NaN. Return it unchanged.
  • If the exponent is 253 or 254, it cannot be increased by two, so the result is infinity.
  • If the exponent is 0, the number is subnormal. If two high bits of the significand field are 00, simply shift the significand left two bits. Otherwise, if the bits are 01, shift the significand left two bits (discarding the shifted-out bits) and set the exponent to 1. Otherwise, shift the significand left one bit and set the exponent to 2.
  • Otherwise, add two the exponent.
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312