9

When working with the excess representation of integers, I use a bias of 2n-1. However, the IEEE 754 standard instead uses 2n-1 - 1.

The only benefit that I can think of is a bigger positive range. Are there any other reasons as to why that decision was taken?

Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122
james_dean
  • 1,477
  • 6
  • 26
  • 37

1 Answers1

10

The reason is both Infinities/NaNs and gradual underflow.

If you use exponents to show both integer (n >= 0) and fractional (n < 0) values you have the problem that you need one exponent for 2^0 = 1. So the remaining range is odd, giving you either the choice of choosing the bigger range for fractions or for integers. For single precision we have 256 values, 255 without the 0 exponent. Now IEEE754 reserved the highest exponent (255) for special values: +- Infinity and NaNs (Not a Number) to indicate failure. So we are back to even numbers again (254 for both sides, integer and fractional) but with a lower bias.

The second reason is gradual underflow. The Standard declares that normally all numbers are normalized, meaning that the exponent indicates the position of the first bit. To increase the number of bits the first bit is normally not set but assumed (hidden bit): The first bit after the exponent bit is the second bit of the number, the first is always a binary 1. If you enforce normalization you encounter the problem that you cannot encode zero and even if you encode zero as special value, the numerical accuracy is hampered. +-Infinity (the highest exponent) makes it clear that something is wrong, but underflow to zero for too small numbers is perfectly normal and therefore easily to overlook as a possible problem. So Kahan, the designer of the standard, decided that denormalized numbers or subnormals should be introduced and they should include 1/MAX_FLOAT.

EDIT: Allan asked why the "numerical accuracy is hampered" if you encode zero as special value. I should better phrase it as "numerical accuracy is still hampered". In fact this was the implementation of the historical DEC VAX floating point format. If the exponent field in the raw bit encoding was 0, it was considered zero. For example I take now the 32 bit format still rampant in GPUs.

X 00000000 XXXXXXXXXXXXXXXXXXXXXXX

In this case, the content of the mantissa field at the right could be completely ignored and was normally filled with zeroes. The sign field at the left side could be valid, distinguishing a normal zero and a "negative zero" (You could get a negative zero by something like -1.0/0.0 or rounding a negative number).

Gradual underflow and subnormals of IEEE 754 in contrast did use the mantissa field. Only

X 00000000 00000000000000000000000

is zero. All other bit combinations are valid and even more practical, you are warned if your result underflows. So whats the point ?

Consider the different numbers

A 0 00000009 10010101111001111111111  
B 0 00000009 10010101111100001010000

They are valid floating point members, very small but still finite. But as you see the first 11 bits are identical. If you now subtract A-B or B-A the first valid bit leaves the lower exponent range, so the result without gradual underflow is....0. So A != B but A-B = 0. Ouch. Countless people have fallen in this trap and it can be assumed that they never recognized it. The same with multiplication or division: You need to add or subtract exponents and if it falls below the lower threshold: 0. And as you know: 0*everything = 0. You could have STXYZ and once one subproduct is 0, the result is 0 even when a completely valid and even huge number is the correct result. It should be said that these anomalities could be never completely avoided due to rounding, but with gradual underflow they became rare. Very rare.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Thorsten S.
  • 4,144
  • 27
  • 41
  • 1
    I wonder how the hardware efficiency of handling denormals compares with the hardware efficiency of having the next larger number after 1.00B-127 be 1.00B-126, then 1.10B-126, then 1.00B-125, 1.01B-125, etc. In other words, round off every number to the nearest 1.00B-127. That would avoid the weird underflow behavior one would normally get without denormals, even though it wouldn't provide the benefit of their extra range. I would guess the range isn't nearly as important as ensuring that (a-b) is zero only if (a==b), so if the rounding was cheaper than denormals, it could be a win. – supercat Mar 20 '12 at 17:24
  • Strange how arguments are repeated in time. In fact during the invention of the IEEE 754 standardization there was a battle between Kahan (use subnormals) and the DEC VAX format which functions almost exactly like the IEEE 754, but use only zero. – Thorsten S. Mar 21 '12 at 21:07
  • Allowing two numbers to exist whose difference is too small to represent is a problem which some implementations choose to ignore. Did VAX format solve the problem or ignore it? My thought would be that the problem shouldn't be ignored, but subnormals aren't the best solution. For cases where multiplications or divisions would resolve to zero, I'd like to see an positive and negative "infinitessimal" values, whereas adds or subtracts yielding zero should yield unsigned zero. Never going to happen, but it would get rid of some of the asymmetries surrounding zeroes. – supercat Mar 21 '12 at 21:13
  • Sorry, I ran out of time for editing. The problem with your approach is that you lose precision for a long exponent range. With single precision you lose a factor of 2^24 = 1.6E7 until you get full precision and with doubles it goes even farther with 2^53 = 9E16. Still you risk getting strange anomalies which will noone understand. There is a rant from Ross Harvey about flawed hardware efficiency, saying that Intel IEEE754 8087 needed 70 uS for multiplication while the NS32 only needed 5 uS (both microseconds. Second comparsion with SpecFP95: 9 for Intel Pentium II, 50 for DEC Alpha 21264 – Thorsten S. Mar 21 '12 at 21:26
  • Please don't anyone wonder that my answers seem a bit strange because I was editing my question out of sync with the supercat answers. One time I hit the return button so it was displayed while I was not ready, one time I hit the 5 min limit. @supercat: While the asymmetric zero is in fact sth I also yearn for, Stack Overflow does not like lengthy discussion, so I stop here now. – Thorsten S. Mar 21 '12 at 22:20
  • Adding denormals allows floats to reach down to roughly 2^-151 instead of 2^-127, but I suspect the number of applications needing that range is small compared to the number that will have trouble if `a>b`, but `(a-b)==0`. – supercat Mar 21 '12 at 22:29
  • @ThorstenS. Could you explain why "if you encode zero as special value, the numerical accuracy is hampered"? – Allan Ruin Jul 08 '14 at 00:45
  • @AllanRuin Hope that helps – Thorsten S. Jul 08 '14 at 12:56