Incorrect floating point rounding

Question

On gcc 4.7.3, my fegetround() function returns FE_TONEAREST. According to the c++ reference, this means rounding away from zero. Essentially, it means saving the last bit that was shifted out when adjusting the precision of the mantissa after multiplication (since it will be twice as long as it should be). Afterwards, the saved bit is added to the final mantissa result.

For example, floating point multiplication gives the following results:

0x38b7aad5 * 0x38b7aad5 = 0x3203c5af

The mantissa after multiplication is

  1011 0111 1010 1010 1101 0101
x 1011 0111 1010 1010 1101 0101
-------------------------------
1[000 0011 1100 0101 1010 1110] [1]000 0101 1001 0101 0011 1001

The [23'b] set holds the significant digits, whereas the [1'b] set holds the last bit shifted out. Note that the mantissa for the result is

[000 0011 1100 0101 1010 1111]

The last bit switched to 1 because the [1'b1] set was added to the spliced mantissa (the [23'b] set) due to the rounding mode.

Here is an example that is stumping me, because it looks to me like the hardware isn't rounding correctly.

0x20922800 * 0x20922800 = 0x1a6e34c (check this on your machine)

  1010 0110 1110 0011 0100 1101
x 1010 0110 1110 0011 0100 1101
-------------------------------
01[01 0011 0111 0001 1010 0110 0][1]00 0000 0000 0000 0000 0000

Final Mantissas:       
Their Result:      01 0011 0111 0001 1010 0110 0
Correct Result(?): 01 0011 0111 0001 1010 0110 1

I've been crunching binary all day, so it's possible I'm missing something simple here. Which answer is correct with the given rounding mode?

“returns FE_TONEAREST. According to the c++ reference, this means rounding away from zero” -> are you quite sure? Look again at the name. — Pascal Cuoq, Dec 06 '13 at 23:39
I think whenever the question is "My error or Intel's", you can give an answer without reading the question, and 99% of the time you'll be right. — Kerrek SB, Dec 06 '13 at 23:40
Hmmm my `fegetround()` returns `0`, which I thought I saw meant `FE_TONEAREST` somewhere... Perhaps I am mistaken about that! EDIT: I just `assert(fegetround() == FE_TONEAREST);`ed successfully. — Suedocode, Dec 06 '13 at 23:42
@dan04: How many people flooded SO to ask about their floating point divisions on their new Pentium back when it was released? — Kerrek SB, Dec 06 '13 at 23:42
@Aggieboy : Round-to-nearest means "round 0.5 to nearest even number." That does seem to be what's happening. — Joe Z, Dec 06 '13 at 23:46
@JoeZ: It wasn't as easy to find via Lycos and Altavista, but it always occupied a prominent slot in the Yahoo "expert forums" directory... — Kerrek SB, Dec 06 '13 at 23:48
@JoeZ The refenence I linked in the OP says "Rounding x to-nearest selects the possible value that is nearest to x, with halfway cases rounded away from zero." I don't see anything mentioned about even numbers. — Suedocode, Dec 06 '13 at 23:48
Your question probably would not have been downvoted but for the title. — Reinstate Monica -- notmaynard, Dec 06 '13 at 23:50
@Aggieboy : cppreference.com is misleading. The default IEEE-754 rounding mode is "round to nearest, ties to even." See here: http://en.wikipedia.org/wiki/IEEE_754-1985#Rounding_floating-point_numbers — Joe Z, Dec 06 '13 at 23:50
@Aggieboy Do you have some reason to believe that reference? — David Schwartz, Dec 06 '13 at 23:53
@KerrekSB : Was that the same thing? The StackExchange 'about' page ( http://stackexchange.com/about ) says StackOverflow was created in 2008. Or are they just referring to SO as a distinct website, and not a forum on Yahoo? — Joe Z, Dec 06 '13 at 23:55
@JoeZ: No, I was kidding. SO is indeed not that old. But my point was that probably most people did *not* immediately come out and wonder in public about the floating point division, despite the bug. It was an embarrassing bug, but it took some extremely specific and intricate work to stumble across it, and the finder presumably spend lots of careful research on making sure it wasn't his fault. — Kerrek SB, Dec 06 '13 at 23:57
@DavidSchwartz I believed it for the same reason that Joe Z believes his wiki reference. ^.^ — Suedocode, Dec 06 '13 at 23:57

score 6 · Accepted Answer · answered Dec 06 '13 at 23:48

6

When rounding to nearest, IEEE specifies that ties round to even. 0 is even, 1 is odd, so Intel is correct.

answered Dec 06 '13 at 23:48

David Schwartz

179,497
17
214
278

Wouldn't this mean all inexact calculations end with a `0` in the mantissa? Why didn't the first one? – Suedocode Dec 06 '13 at 23:51
5

@Aggieboy: It only rounds 1/2 to even. If you are even an iota above 1/2, it'll round up, as in your first example. Your second example was rounding _exactly_ 1/2. – Joe Z Dec 06 '13 at 23:52
Aha that was the explanation I was looking for! I misunderstood what that rounding mode meant. – Suedocode Dec 06 '13 at 23:59

score 2 · Answer 2 · answered Dec 10 '13 at 21:13

First rounding to nearest lacks one detail here. It is rounding to nearest (even).

IEEE 754 standard (Section 4.3.1) quote:

roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered

In your first example you compute square of (8.75794e-5) which (if represented as 32 bit float) has the following hex pattern: 0x38b7aad5.

All 24 significand bits of (8.75794e-5) are:

0xb7aad5 = 1.0110111_10101010_11010101

Now after squaring that you get:

1.0000011_11000101_10101110_10000101_10010101_00111001

It is noteworthy that in 99% of cases your computations will be performed on FPU (x87 probably) which operates on 80bit floating point format.

Intel® 64 and IA-32 Architectures Software Developer’s Manual

(PROGRAMMING WITH THE X87 FPU):

When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended-precision floating-point format (if they are not already in that format).

Now you want to store your result in 32 bit float:

1.[0000011_11000101_10101110]10000101_10010101_00111001

and here is where rounding modes matter. IEEE 754 defines 4 of them but let's focus on the default one (rounding to nearest (even)) as we discuss this one here.

Now that your FPU has the result (the whole - we have 64 significand bits in 80bit format) computed it must perform rounding to fit the number within 32 bits (24significand bits). All 23 bits that would need to be explicitly stored are placed within brackets above.

Now rounding to nearest has nothing to do with even word in this particular case since bits on the right of the bracket are not halfway between:

1.[0000011_11000101_10101111]
and
1.[0000011_11000101_10101110]

but they are nearer to

1.[0000011_11000101_10101111]

This is why your result's significand is 0x3203C5AF.

Now problematic result of squaring 2.4759832E-19 0x20922800.

24 significand bits of 2.4759832E-19 are :

0x922800 = 1.0010010_00101000_0000_0000

and squared:

1.[0100110_11100011_01001100]10000000_00000000_0000000

And here is where even part really matters. Now your value lies exactly halfway between:

1.[0100110_11100011_01001101]
and
1.[0100110_11100011_01001100]

Above 2 values are said to bracket your value. From them you now need to choose even one (the latter since lsb=0).

Now you know why 24bits of your result are 0xA6E34C(nearest even) and not 0xA6E34D(nearest but odd)

Incorrect floating point rounding

2 Answers2