What are the differences/similarities between the fixed point (bit-level) representation of a value in [0,1] compared to its floating point (bit-level) value?
1 Answers
In a fixed-point representation, each bit represents a fixed value. For example, in a simple binary integer format, the lowest (least significant) bit represents 1, the next represents 2, the next represents 4, then 8, and so on. The number represented is the sum of the values of the set bits. (I will omit discussion of the sign bit and two’s complement or other choices.)
For other fixed-point representations, the values are scaled by a fixed amount. For example, in a Q.8 format, each bit would have 1/256th the value (2−16) the value it has in normal integer scaling. So the low bit would represent 2−16, the next 2−15, and so on.
For floating-point representations, the values of the bits float. They are determined by an exponent value. The bits are partitioned into bits that represent the main value (called the significand, also called the fraction portion or, in legacy documents, the mantissa) and bits that represent the exponent, along with a bit for the sign. The exponent bits typically use a binary integer format along with some fixed bias (for example, take the binary integer represented by the integer bits and subtract 127 to get the value represented for the exponent). Also, some values of the exponent bits may be reserved for special cases, such as infinities, NaNs, and subnormal numbers.
Once the exponent value e is determined, the significand bits have values that are scaled by 2e. Commonly, there is an implicit bit with value 2e, the highest explicit bit has value 2e−1, the next 2e−2, and so on.

- 195,579
- 13
- 168
- 312