5

We are experiencing an issue with floating point precision within a dynamic library.

The set-up is as follows:

  • We have a dynamic library, which performs a computation X on a large array of floating point numbers. X consists of a lot of floating point operations.
  • We link this dynamic library to two executables: A and B.
  • Within the library we print the input for computation X.
  • For both running executable A and B the exact same input is reported (up to DBL_DIG decimals).
  • The output of computation X, however, is different for executable A than it is for executable B.

Both executables and the library are written in C++ and compiled on the same machine using the same GCC compiler version. The library is only compiled once with the same compiler settings as executable A but the compiler settings for executable B may be different.

As the same library is used, we expected the same computation precision for both executables when providing the same input. It looks like the floating point precision of the library is influenced by external factors, e.g. process specific configurations.

Is this possible and if so, how can be make sure we get the same precision in both runs (program A and B)?

Edit 1

I succeeded in creating a minimal example that demonstrates the differences. If I use the following code in the library (say as computation X) the results are different for both runs (A and B):

float* value = new float;
*value = 2857.0f;
std::cout << std::setprecision(15) << std::log(*value) << std::endl;

I also printed the floats in binary format and they show a difference in the last bit.

Unfortunately cannot control the whole build chain of executable A. Actually A is a dynamic library again which is used from another executable for which I cannot control nor know the compiler options.

I tried using a lot of different optimization compiler options on executable B to see if I can get the same results as for executable A, but until now this did not solve the problem.

Edit 2

The assembler output of the code above is:

.LFB1066:
  .cfi_startproc
  .cfi_personality 0x9b,DW.ref.__gxx_personality_v0
  push  rbp #
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  push  rbx #
  .cfi_def_cfa_offset 24
  .cfi_offset 3, -24
  sub rsp, 8  #,
  .cfi_def_cfa_offset 32
  mov edi, 4  #,
  call  _Znwm@PLT #
  mov DWORD PTR [rax], 0x45329000 #* D.23338,
  mov rdi, QWORD PTR _ZSt4cout@GOTPCREL[rip]  # tmp66,
  mov rax, QWORD PTR [rdi]  # cout._vptr.basic_ostream, cout._vptr.basic_ostream
  mov rax, QWORD PTR -24[rax] # tmp68,
  mov QWORD PTR 8[rax+rdi], 15  # <variable>._M_precision,
  movsd xmm0, QWORD PTR .LC1[rip] #,
  call  _ZNSo9_M_insertIdEERSoT_@PLT  #
  mov rbx, rax  # D.23465,
  mov rax, QWORD PTR [rax]  # <variable>._vptr.basic_ostream, <variable>._vptr.basic_ostream
  mov rax, QWORD PTR -24[rax] # tmp73,
  mov rbp, QWORD PTR 240[rbx+rax] # D.23552, <variable>._M_ctype
  test  rbp, rbp  # D.23552
  je  .L9 #,
  cmp BYTE PTR 56[rbp], 0 # <variable>._M_widen_ok
  je  .L5 #,
  movsx esi, BYTE PTR 67[rbp] # D.23550, <variable>._M_widen

Edit 3

As suggested in the comments I printed both the floating point rounding mode and SSE status information in the library.

For both runs (executable A and B) I get the same values:

  • Rounding mode: 895
  • SSE status: 8114
Coert Metz
  • 894
  • 6
  • 21
  • 1
    Is one of the binaries being compiled with a high level of optimization? Some levels might enable unsafe math or modify default precision. See: https://gcc.gnu.org/wiki/FloatingPointMath for a list of switches that change the behavior of floating point operations in GCC. – Matthew Jun 16 '15 at 19:00
  • I have to look this up. Will come back to it tomorrow. But does this mean the compilation settings of the executable can influence the precision within the already compiled shared library? – Coert Metz Jun 16 '15 at 19:27
  • Yes, if the option changes the default float width or computation behavior. Try to print the actual bytes of the variables inside the library and don't use printf to see if they're being represented correctly, and use `-ffloat-store` across all compilations to make sure the width stays the same. As a preliminary you should force `-O0` across all compilations to see if that fixes the issue. – Matthew Jun 16 '15 at 19:38
  • 1
    Yes this is possible depending on the optimization levels. Certian levels of optimization will result in different instruction sets being used (see SIMD Simple Instructions Multiple Data) as well as doing several intermediate calculations on the CPU instead of storing intermediate values back to memory. – Matthew Hoggan Jun 16 '15 at 19:45
  • Also, the different applications may set up different rounding modes and other floating point calculation parameters... – twalberg Jun 16 '15 at 20:52
  • Thanks for the suggestions. I updated my question to comment on their results. – Coert Metz Jul 01 '15 at 07:12
  • Just to confirm, you're loading the exact same .so into two different executables, and it shows this difference? What hardware platform are you using - x86? – Useless Jul 01 '15 at 07:19
  • True. The same so file (on disk) is used in the two different executables. The platform is x86_64. – Coert Metz Jul 01 '15 at 07:21
  • Even if the rounding mode is set differently by the different executables (for example), I'm surprised that affects a hard-coded float literal assignment. Can you check the assembly for your minimal example and see what it's doing? – Useless Jul 01 '15 at 08:07
  • @Useless: I added the assembler to the question. I am not so familiar with assembler output. This is just the first section (.LFB1066) below the function name. There are some more below that. Let me know if this is sufficient. – Coert Metz Jul 01 '15 at 09:07
  • Try to dump the result of `fegetround` (or even `_FPU_GETCW`) in the two cases. – Matteo Italia Jul 01 '15 at 09:10
  • @Matteo fegetround seems to be a C++11 feature. Is there a C++98 equivalent to get this information? I am building in 98 mode using a fairly old GCC version (4.4). – Coert Metz Jul 02 '15 at 10:36
  • @CoertMetz: you can do `#define FPU_GETCW(x) asm volatile ("fnstcw %0":"=m" (x))` and then use it like `uint16_t cw; FPU_GETCW(cw);`. – Matteo Italia Jul 02 '15 at 11:11
  • @Matteo Thanks. In both cases the output is 895. – Coert Metz Jul 02 '15 at 12:09
  • @CoertMetz: What about the SSE control status register? `#define MX_GETCSR(x) asm volatile ("stmxcsr %0":"=m" (x))` / `uint32_t csr; MX_GETCSR(csr);`. – Matteo Italia Jul 02 '15 at 12:16
  • @Matteo this prints 8114 in both cases. – Coert Metz Jul 06 '15 at 07:02

1 Answers1

1

The answer to your question is: yes, in principle a process can change the floating-point context within which your code operates.


About your particular code and values:

The rounding mode (as Matteo suggests) could affect string formatting as it repeatedly divides by 10 - but I can't reproduce the problem using std::fesetround.

I also can't see how it would really affect the bit pattern you say was different. The assembly code shows the literal 0x45329000, which is equivalent to 2857.0, and the literal itself can't be altered by the floating point env.

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
Useless
  • 64,155
  • 6
  • 88
  • 132
  • Not sure if this add information, but note that in streaming to cout I perform a log on the value 2857. The result of this log differs. The value 2857 itself does not seem to. – Coert Metz Jul 01 '15 at 11:45