Possible loss of precision between two different compiler configurations

Question

I am currently stuck on a problem at work that involves a possible loss of precision when the compiler configuration is changed from Debug to Release, which have different levels of optimization. For some reason, elsewhere in our code, extremely large values have been used for covariance matrices (and things of that sort), values somewhere along the lines of 1e90. The problem I'm encountering is that whenever there is any sort of loss of precision in a calculation and one of these extremely large values is still around, the product of the two introduces some instability. I'm not sure why more reasonable values aren't used, but I'm not the one that wrote this code, so yeah... As of now, I believe I have tracked down the problem to a specific location. The exact numbers I have at that location are shown below:

DBL sum = 6.000000040000000400e-004; // same for debug and release configurations
const DBL dinv = 2.000000020000000300e-004; // same for debug and release configurations

Note that DBL is your ordinary double:

typedef double DBL;

Then, the following operation is performed:

sum /= dinv;

This yields:

sum = 2.999999990000000100e+000 // (for debug configuration)<br>
sum = 2.999999989999999600e+000 // (for release configuration)

I took a look at the disassembly for the two configurations and found some differences (expected because of different amounts of optimization).

--DEBUG--

1D91FF73  movsd       xmm0,mmword ptr [sum]
1D91FF78  divsd       xmm0,mmword ptr [dinv]
1D91FF7D  movsd       mmword ptr [sum],xmm0

I haven't ever really read disassembly, but my understanding is as follows: sum is moved to xmm0, then xmm0 is divided in-place by dinv (result is in xmm0 since division is in-place), then xmm0 is moved to sum.

As expected, the disassembly for release is different.

--RELEASE--

1D7557AB  movsd       xmm1,mmword ptr [esp+50h]  
1D7557B1  xorps       xmm0,xmm0  
1D7557B4  mulsd       xmm1,mmword ptr [esp+68h]

The disassembly for the assignment of sum to dinv is:

1D7B55B7  movsd       xmm1,mmword ptr [esp+68h]

Am I correct in thinking that dinv is the value pointed to by the pointer represented by [esp+68h] and sum is the value pointed to by the pointer represented by [esp+50h]? If not, what is the case?

Does anybody know why I am losing precision? What is the purpose of xorps?

The x86 Instruction Set Reference at this link may be helpful: http://x86.renejeschke.de/

--UPDATE--
As the answer below mentioned, the Debug configuration was using /fp:precise and the Release configuration was using /fp:fast (was using Microsoft Visual Studio 2013, to get to the build configuration settings for a project, simply right-click on that project, click properties, then navigate to C/C++). For me, this resulted in round-off errors on the order of 1e-15, give or take an order. This was a problem for me because elsewhere in the code, some people were using extremely large values (on the order of 1e90, give or take an order). One thing I did to "break" the Debug configuration for testing purposes was to split the sum /= dinv computation into two steps. First, take the reciprocal of dinv by computing 1.0/dinv (this is mentioned as being a bad operation to perform in the answer below), multiply that result by sum, and place the result into sum. I found when I did this that Debug and Release both behaved poorly.

Thanks for editing, I spent a few minutes trying to get it right, but it kept giving me errors. :-) — skankinkid33, Mar 20 '14 at 11:20
Xorps is just XOR. So it is basically clearing the xmm0 register. — anonymous, Mar 20 '14 at 11:20
Sorry I forgot to specify what DBL was. `typedef double DBL;` — skankinkid33, Mar 20 '14 at 11:31
if you are you using gcc: -Ofast Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays. — Alexander Oh, Mar 20 '14 at 11:37
This doesn't make sense. It looks like the release version is doing multiplication on local variables (ie. some offset of stack pointer). And loading whatever is at esp+68h back into xmm1. I just noticed that code that moves esp+68h back to xmm1 is at some further address location. — anonymous, Mar 20 '14 at 12:00
Should I copy and paste more of the disassembly for the release version? Maybe I copied and pasted the wrong part. — skankinkid33, Mar 20 '14 at 12:01
Are you able to reproduce this problem with a small test code and post the entire ASM code? Let the ASM gurus have a crack at it too. — anonymous, Mar 20 '14 at 12:19
Good idea, I'll write up a simple function and post it here by the end of the day. Have a few other things to do in the meantime... Thank you to everybody for their help so far! — skankinkid33, Mar 20 '14 at 12:21
I was not able to reproduce the problem with a small test code as I could not get it to function in the same way; however, the answer below solved my problem. I will update my original posting with a bit of explanation of one or two things I tried. — skankinkid33, Mar 21 '14 at 13:23

manlio · Accepted Answer · 2014-03-20T20:27:53.423

1

If you are using

GCC with -freciprocal-math (directly or indirectly via -funsafe-math-optimizations or -ffast-math or -Ofast)
Visual Studio and the fp:fast mode for Floating-Point Semantics

the compiler may generate a standard division instruction in debug mode:

1D91FF78  divsd       xmm0,mmword ptr [dinv]

or a "division by multiplicative inverse" in release mode:

1D7557B4  mulsd       xmm1,mmword ptr [esp+68h]

Mathematically

a / b = a * (1 / b)

but in the real world multiplying by the reciprocal is always going to introduce more error and compilers are not allowed to perform this optimization because the results would be different and non-conformant (wrt IEEE-754).

edited Mar 20 '14 at 20:27

answered Mar 20 '14 at 15:32

manlio

18,345
14
76
126

1

Thank you for your answer. I looked at the compiler settings and found that /fp:precise was selected for the Debug configuration and /fp:fast was selected for the Release configuration. One other thing I tested to convince myself that a loss of precision was the problem was to perform the `sum /= dinv` operation into two steps. First, I computed `1.0/dinv`, then I multiplied the result by `sum`. When I did this, the Debug and Release configurations were more in agreement with each other, and both wrong (due to the really large values I mentioned in my description). – skankinkid33 Mar 21 '14 at 13:22

Possible loss of precision between two different compiler configurations

1 Answers1

Linked