How to make GNU GCC optimize OpenMP threads similarly

Question

This is my first post here. Yay! Back to the problem:

I'm learning how to use OpenMP. My IDE is Code::Blocks. I want to improve some of my older programs. I need to be sure that the results will be exactly the same. It appears that "for" loops are optimized differently in the master thread than in the other threads.

Example:

#include <iostream>
#include <omp.h>
int main()
{
    std::cout.precision(17);
    #pragma omp parallel for schedule(static, 1) ordered
    for(int i=0; i<4; i++)
    {
        double sum = 0.;
        for(int j=0; j<10; j++)
        {
            sum += 10.1;
        }
        #pragma omp ordered
        std::cout << "thread " << omp_get_thread_num() <<  " says " << sum << "\n";
    }
    return 0;
}

produces

thread 0 says 101
thread 1 says 100.99999999999998579
thread 2 says 100.99999999999998579
thread 3 says 100.99999999999998579

Can I somehow make sure all threads receive the same optimization than my single-threaded programs (that didn't use OpenMP) have received?

EDIT:

The compiler is "compiler and GDB debugger from TDM-GCC (version 4.9.2, 32 bit, SJLJ)", whatever that means. It's the IDE's "default". I'm not familiar with compiler differences.

The output provided comes from the "Release" build, which is adding the "-O2" argument.

None of "-O", "-O1" and "-O3" arguments produces a "101".

You can try my .exe from dropbox (zip file, also contains possibly required dlls).

Could you provide more information? Which compiler are you using? For instance, if I use gcc 4.8.x I get: thread X says 100.99999999999999 for all threads. That being said, most OpenMP runtimes simply create a new outlined routine for the loop body and every thread executes the same routine. — Harald, Mar 01 '16 at 13:35
@Gilles was it the same IDE? Because Code::Blocks adds extra arguments during build, like "-O2". — Stratubas, Mar 01 '16 at 14:10
I tried in both 32 and 64b mode with gcc 4.9.3 and gcc 5.3.0, with various levels of optimisation: I always get all the threads printing the same output, which is most of the time "100.99999999999999". **Only** when in 32b mode and with -O2 do I get "101", but for all threads, so no discrepancies, even here... — Gilles, Mar 01 '16 at 14:23
It might just be that since the code is also executed on the main thread that no function is created there and that the compiler can do constant folding using higher precision floats or that contrary to the function call everything fits into an extended precision register and is never spilt on the stack. — Voo, Mar 01 '16 at 14:25
@Gilles I only get 101 for thread 0 and only with -O2. Could it be the hardware? Could it be the OpenMP version (I have no idea which one I'm using...)? — Stratubas, Mar 01 '16 at 14:42

score 2 · Answer 1 · edited May 23 '17 at 12:14

2

This i happens because float or double data type can not represent some numbers like 20.2

#include <iostream>
int main()
{
    std::cout.precision(17);
    double a=20.2;
    std::cout << a << std::endl;
    return 0;
}

its output will be

20.199999999999999

for more information on this see Unexpected Output when adding two float numbers

Don't know why this does not happens for the first thread but if you remove openMP then too you will get the same result.

edited May 23 '17 at 12:14

Community

1
1

answered Mar 01 '16 at 15:44

Sarthak Singh

193
3
12

1

The issue is why the threads that should be doing exactly the same are getting different results, not why the really might not be what you'd expect with rational numbers. – Voo Mar 01 '16 at 16:27
@Voo is correct. I know I won't be getting 20.2, but if I multiply this by 10, I need to have 201.999999999999997 or 202 consistently, whichever the single-threaded program would produce. – Stratubas Mar 01 '16 at 17:02

score 0 · Answer 2 · answered Mar 01 '16 at 15:36

0

From what I get this is simply numerical accuracy. For a double value type you should expect 16 digits precision.

I.e. the result is 101 +/- 1.e-16*101

This exactly the range you get. And unless you use something like quadruple precision, this is as good as it gets.

answered Mar 01 '16 at 15:36

Mohammed Li

823
2
7
22

1

The issue is why the threads that should be doing exactly the same are getting different results, not why the really might not be what you'd expect with rational numbers. – Voo Mar 01 '16 at 16:27
Original question was "I want to improve some of my older programs. I need to be sure that [b]the results will be exactly the same.[/b]". Within the achievable precision, results are the same. Unless you use some quadruple precision data types, this will not get better. And even if you do it you will face the exact same problem, just shifted by 16 digits. – Mohammed Li Mar 01 '16 at 16:35
@Voo is correct. In my programs the last digits matter, because of the "butterfly effect" [link](https://en.wikipedia.org/wiki/Butterfly_effect). – Stratubas Mar 01 '16 at 16:58
@Stratubas "In my programs the last digits matter" hmm, if you're using any physical measurements as input, they clearly can't. The length of the metre is not known to that accuracy, and other base units have similar problems! – Jim Cownie Mar 03 '16 at 10:19
1

@Jim Sure... but I'm not using physical measurements :-) – Stratubas Mar 03 '16 at 18:16

How to make GNU GCC optimize OpenMP threads similarly

2 Answers2