Why does this calculation give different result in boost::thread and std::thread?

Question

When this floating point calculation is executed in boost::thread, it gives different result than when executed in std::thread or in main thread.

void print_number()
{
    double a = 5.66;
    double b = 0.0000001;
    double c = 500.4444;
    double d = 0.13423;
    double v = std::sin(d) * std::exp(0.4 * a + b) / std::pow(c, 2.3);

    printf("%llX\n%0.25f\n", *reinterpret_cast<unsigned long long*>(&v), v);
}

This seems to happen because boost::thread is by default using 53-bit internal precision for floating point math, while the main thread is using 64-bit precision. If status of FPU unit is reset with _fpreset() after the boost::thread has been created, the result is the same as in the main thread.

I am using Embarcadero C++ Builder 10.1 (compiler bcc32c version 3.3.1) and Boost 1.55.0. My environment is Windows 7, and I am building for 32-bit Windows target.

Working example:

#include <tchar.h>
#include <thread>
#include <boost/thread.hpp>
#include <cstdio>
#include <cmath>
#include <cfloat>

namespace boost { void tss_cleanup_implemented() {} }

void print_number()
{
    double a = 5.66;
    double b = 0.0000001;
    double c = 500.4444;
    double d = 0.13423;
    double v = std::sin(d) * std::exp(0.4 * a + b) / std::pow(c, 2.3);

    // Edit:
    // Avoiding the undefined behaviour by a reinterpret_cast, as
    // mentioned in some answers and comments.
    unsigned long long x;
    memcpy(&x, &v, sizeof(x));

    printf("%llX\n%0.25f\n", x, v);
}

void print_number_2()
{
    // Reset FPU precision to default
    _fpreset();
    print_number();
}

int _tmain(int argc, _TCHAR* argv[])
{
    print_number();

    std::thread t1(&print_number);
    t1.join();

    boost::thread t2(&print_number);
    t2.join();

    boost::thread t3(&print_number_2);
    t3.join();

    getchar();
    return 0;
}

Output:

3EAABB3194A6E99A
0.0000007966525939409087744
3EAABB3194A6E99A
0.0000007966525939409087744
3EAABB3194A6E999
0.0000007966525939409087488
3EAABB3194A6E99A
0.0000007966525939409087744

Question:

Why does this happen? Isn't a new thread supposed to inherit floating point environment from the parent thread?
Is this a bug in the compiler or in Boost, or are my expectations wrong?

Completely unrelated, but I gotta say, your question presentation is absolutely *stellar*. — WhozCraig, Jul 14 '16 at 10:35
I can confirm this, using C++Builder 10.1 Berlin, 32 bit target, on Windows 7. Obviously the `_fpreset()` makes the difference. I assume that `boost::thread` doesn't do it, and `std::thread` does. — Rudy Velthuis, Jul 15 '16 at 19:02
FWIW, to make it compile I had to add a line with `#define BOOST_THREAD_USE_LIB` before `#include `. I assume this was defined externally? — Rudy Velthuis, Jul 15 '16 at 19:07
@Ville-ValtteriTiittanen: the difference was indeed the fact that Boost doesn't do a `_fpreset()`, which is a C++Builder specific thing, and the `std` library was obviously modified to make that call. See my answer. — Rudy Velthuis, Jul 15 '16 at 19:45
FWIW, I edited your `print_number()` function to use `memcpy()` to stop the complaints about undefined behaviour, since that was irrelevant to the question. — Rudy Velthuis, Jul 16 '16 at 00:11

score 5 · Answer 1 · edited Jul 14 '16 at 13:34

5

This: *reinterpret_cast<unsigned long long*>(&v) is undefined behaviour as v is not unsigned_long_long. If you want to copy the binary representation of a double to an integral type, use memcpy(). Note that, even with memcpy(), it's implementation defined how the binary representation will look like, but you're guaranteed that you can 'load back what you've saved'. Nothing more AFAIK.

edited Jul 14 '16 at 13:34

zero298

25,467
10
75
100

answered Jul 14 '16 at 13:29

lorro

10,687
23
36

And why are the values different when the double value is printed in decimal form? In this example I used the internal representation just to show that this is not a bug in `printf`. – VLL Jul 14 '16 at 17:21
2

@Ville-ValtteriTiittanen This answer is only pointing out a bug in your code, it doesn't mean that this bug is the source of your problem. Fixing the bug is a prerequisite for any further investigation, even if it appears not to "change" anything. – Kuba hasn't forgotten Monica Jul 14 '16 at 17:54
I this answer is only pointing out a bug in the code, then it should have been a comment. The bug is not responsible for the difference in the results (in 32 bit C++Builder, an `unsigned long long` is the same size as a `double`) , so this does not answer the question. – Rudy Velthuis Jul 15 '16 at 21:18
@RudyVelthuis: being the same size doesn't mean it's not UB. It's *still* UB. Also, in the second part I explicitly wrote that you've only got the 'load back what you've saved' warranty, nothing else. In particular, for doubles there's no warranty that your implementation will use the same precision everywhere or that the result of two mathematically equivalent calculations will be the same. That's for Std. C++. – lorro Jul 15 '16 at 21:34
I agree it is UB. But in this implementation, it does not change the result and it does not explain the difference. I just tried with memcpy() from v to an unsigned long long, and the result is the same. So this UB is not the cause for the difference between the uses of boost::thread and std::thread.So again, I think this should have been a comment, not an answer. – Rudy Velthuis Jul 15 '16 at 22:05
@RudyVelthuis: please read my last two sentences as well (both in the answer and in the previous comment) Thanks. – lorro Jul 16 '16 at 11:32
@lorro: I did. Did you see how I changed the question to use `memcpy()` as you suggested and how this did not change anything? I ran the code here, both using `reinterpret_cast<>()` and `memcpy()` and I see the same results. The behaviour may have been undefined, but not undetermined and it was completely irrelevant to the problem. The problem is that you see different results in boost::thread and in std::thread. The hex display was merely to demonstrate how much the results differ (1 ulp), and could have been omitted altogether. I still think this should not be an *answer*, as it isn't. – Rudy Velthuis Jul 16 '16 at 11:37
@RudyVelthuis: 'In particular, for doubles there's *no warranty* that your implementation will use the *same precision everywhere* or that the *result of two mathematically equivalent calculations will be the same*. That's for Std. C++.', post: 'Note that, even with `memcpy()`, it's *implementation defined how the binary representation will look like*, but *you're guaranteed that you can 'load back what you've saved'*. Nothing more AFAIK.' – lorro Jul 16 '16 at 11:47
This is not about "everywhere", so that doesn't matter. This is a question about the new, Clang-based 32 bit compiler for C++Builder. In Win32, the format of a double is known to be IEEE 754 compliant, a little-endian 64 bit type. The problem probably does not appear in other compilers, so "everywhere" does not count. – Rudy Velthuis Jul 16 '16 at 11:53
Anyway, pointing to UB should have been a comment, not an answer. It certainly does not answer the question of why there is a difference in results, which would also exist if the hex display was not given at all. – Rudy Velthuis Jul 16 '16 at 11:55

score 4 · Answer 2 · answered Jul 14 '16 at 16:13

This isn't a difference between 64 and 53 bit precision FPU calculations, it is a difference in ROUNDING. The only difference between the two results is in the least significant bit of the answer. It looks like boost's thread start code is not properly initializing the FPU flags, and the default rounding mode is down or chop, rather than nearest.

If this is the case, then it could be a bug in boost::thread. It could also come around if another library is changing the FPU flags (via _controlfp_s or a similar function), or if the new thread is part of a thread pool, a previous user of the thread changed the flags, and the pool did not reset them before reusing the thread.

Not a previous use of the thread. Win32 API functions tend to use different FPU settings from those in C++Builder (and Delphi). Thread implementations on Win32 use such API functions. The C++Builder STL knows this, and resets the FPU. Boost does not. Hence the differences. — Rudy Velthuis, Jul 15 '16 at 20:48

Rudy Velthuis · Accepted Answer · 2016-07-15T22:13:26.060

2

The difference seems to be the fact that the std::thread implementation does an _fpreset(), while boost::thread obviously doesn't. If you change the line

namespace boost { void tss_cleanup_implemented() { } }

to (formatted a little for clarity):

namespace boost 
{ 
    void tss_cleanup_implemented() 
    { 
        _fpreset(); 
    }
}

You will see that all values are exactly the same now (3EAABB3194A6E99A). That tells me that Boost doesn't do an _fpreset(). This call is necessary because some Windows API calls mess up the standard FPU settings C++Builder (32 bit) uses and don't set them back to what they were (this is a problem you can encounter in Delphi as well).

both std::thread and boost:thread use Win32 API calls to handle threads.

Something tells me that you expected this already, hence the test with print_number_2() which does an _fpreset().

edited Jul 15 '16 at 22:13

answered Jul 15 '16 at 19:38

Rudy Velthuis

28,387
5
46
94

Is this still an issue when using a 64-bit version of Windows? I would only hope that they stop using the x87 FP stack. – Tim Jul 16 '16 at 00:07
I didn't try this on 64 bit yet, but I would guess it isn't. I'll check. – Rudy Velthuis Jul 16 '16 at 00:13
@Tim: no, it is not an issue. The same value in each test, **but now it is `3EAABB3194A6E998`**! I guess because for Win64, the C++Builder compiler uses SSE, and that does not have 80 bit intermediates. – Rudy Velthuis Jul 16 '16 at 00:16
Excellent. Thanks for checking! – Tim Jul 16 '16 at 02:24
This answer correctly identifies the problem, so I marked it as accepted. However, `tss_cleanup_implemented()` seems to be wrong place to add this. It is only called for the first `boost::thread`, which can be seen when a breakpoint is added in debugger. If you add more boost threads, which you start like `t2`, they will still give incorrect results. – VLL Jul 18 '16 at 10:39
I know about tss_cleanup_implemented() and why it exists, but for this sample project, it was the easiest way to insert _fpreset(). In real life, it should be in the Boost sources, with a conditional for the Embarcadero compiler. – Rudy Velthuis Jul 18 '16 at 10:56
I see. I have made a bug report to Boost: https://svn.boost.org/trac/boost/ticket/12330 – VLL Jul 18 '16 at 11:04

score 1 · Answer 4 · answered Jul 14 '16 at 14:08

1

To whit, you need a better compiler.

This seems to happen because boost::thread is by default using 53-bit internal precision for floating point math, while the main thread is using 64-bit precision. If status of FPU unit is reset with _fpreset() after the boost::thread has been created, the result is the same as in the main thread.

This is insane. If your compiler is using a different FP unit (i.e., x87 vs SSE) for different regions of code, you should burn that compiler with the biggest fire you can find.

Running this code under g++-6.1 and clang++-3.8 on Linux Mint 17.3, gives identical results for each thread type.

#include <thread>
#include <boost/thread.hpp>
#include <cstdio>
#include <cmath>

void print_number() {
    double a = 5.66;
    double b = 0.0000001;
    double c = 500.4444;
    double d = 0.13423;
    double v = std::sin(d) * std::exp(0.4 * a + b) / std::pow(c, 2.3);

    printf("%llX\n%0.25f\n", *reinterpret_cast<unsigned long long*>(&v), v);
}

int main() {
    print_number();

    std::thread t1(&print_number);
    t1.join();

    boost::thread t2(&print_number);
    t2.join();
}

CXX -std=c++14 -O3 -c test test.c -pthread -lboost_thread -lboost_system

3EAABB3194A6E999
0.0000007966525939409086685

3EAABB3194A6E999
0.0000007966525939409086685

3EAABB3194A6E999
0.0000007966525939409086685

As @lorro noted in his/her answer, you are breaking the aliasing rules in the reinterpret_cast.

answered Jul 14 '16 at 14:08

Tim

1,517
1
9
15

Re: "To whit [sic], you need a better compiler." Since `std::thread` works as expected and `boost::thread` doesn't, it seems much more likely that `boost::thread` is doing something peculiar than that the compiler is. – Pete Becker Jul 14 '16 at 16:44
Yet, using two of the three top-tier C++ compilers (the third being MSVC which I don't have access to) provide consistent results across implementations of `std::thread`. – Tim Jul 14 '16 at 17:05
4

I don't see what your point is. Yes, `std::thread` works fine everywhere, as far as has been reported. It's `boost::thread` that doesn't work. There's nothing here that justifies blaming the compiler. – Pete Becker Jul 14 '16 at 17:18
By switching compilers and using the same boost implementation, I see consistent results between `std::thread` (at least two different implementations) and `boost::thread` indicating that it is _not_ the Boost implementation that is at fault. – Tim Jul 14 '16 at 17:52
Non sequitur. Even if there's something wrong in what the compiler does (and there's nothing here to indicate that that's the case), it's the responsibility of the library to work around it. Since floating-point settings seem to be wrong for `boost:;thread`, it's not doing what it's supposed to do. – Pete Becker Jul 14 '16 at 19:25
I have not tested other compilers, but if this is Boost bug, some things could hide the problem from you: 1) you are using `-O3`, maybe your computer does the whole calculation compile-time and prints a constant value, 2) there might be some compiler or OS specific code in Boost headers that causes this. – VLL Jul 14 '16 at 19:27
@Ville-ValtteriTiittanen I can rule out (1) because I looked at the emitted assembly and the cmath routines are being called. This makes sense as they are not `constexpr`. (2) is an open possibility. For example, I didn't consider if the OP is using a 32-bit environment. – Tim Jul 14 '16 at 20:07
@PeteBecker Embarcadero claims to be based on Clang. Yet Clang++-3.8 has no issue building correct code for Boost::Thread on my system. As I noted I don't have access to a Windows machine, so I will hold out judgement on that platform until someone can build it there with clang++-3.8. "Since floating-point settings seem to be wrong for boost:;thread" Conclusion assumed without evidence. If this were true, then my results would have been the same as the OP's (unless Clang has a yet-unknown WIndows-specific bug for Boost::Thread). – Tim Jul 14 '16 at 20:15
Go for it, Tim. You have not presented **any** evidence that the compiler is at fault, just hand-waving arguments that it must be the case because Boost works okay with some **other** compiler. And, again, **even if** there's a bug in the Embarcadero compiler, it's the responsibility of the library to work around it. Your made-up explanations for why the compiler is insane simply don't hold water. – Pete Becker Jul 14 '16 at 20:26
`it's the responsibility of the library to work around it` That's a strange view. – deviantfan Jul 15 '16 at 19:43
It is not really a bug in Boost, it is an omission of calling _fpreset(), which is specific to C++Builder on Win32, since it uses different defaults for the FPU control word and Win32 API calls tend to change this. The STL for C++Builder was enhanced to call this function, the general Boost code has no knowledge of this. – Rudy Velthuis Jul 16 '16 at 09:30

Why does this calculation give different result in boost::thread and std::thread?

Working example:

Output:

Question:

4 Answers4