OpenMP - Random running time - why having so high run-time variance?

Question

I am following Tim Mattson's lectures on OpenMP to learn ways of implementation of some parallel programming concepts.

I was trying to observe the running time behavior of a parallel program that computes the value of PI using 3x10^8 steps.

Here is the code,

#include <omp.h>
#include <stadio.h>

static long num_steps = 300000000;
double step;
#define PAD 8 // tried 50 too
#define NUM_THREADS 4
int main()
{
    int i, nthreads;
    double pi, sum[NUM_THREADS][PAD];
    double ts, te;

    ts = omp_get_wtime();

    step = 1.0/(double) num_steps;
    omp_set_num_threads(NUM_THREADS);
    #pragma omp parallel
    {
        int i, id,nthrds;
        double x;

        id = omp_get_thread_num();
        nthrds = omp_get_num_threads();
        if (id == 0)  nthreads = nthrds;
        for (i=id, sum[id]=0.0;i< num_steps; i=i+nthrds) {
            x = (i+0.5)*step;
            sum[id][0] += 4.0/(1.0+x*x);
        }
    }

    for(i=0, pi=0.0;i<nthreads;i++)
        pi += sum[i][0] * step;

    te = omp_get_wtime();

    printf("%.10f\n", pi);
    printf("%.f\n", te-ts);

}

Now I was on Ubuntu 14.04 LTS running on a Dual Core machine. A call to omp_get_num_procs() returned 2. The running time was something like totally random, ranging from 1.31 second to 4.46 seconds. Whereas the serial program was taking 2.31 second almost always.

I tried creating 1, 2, 3, 4, upto 10 threads. The running time varies too much in every case, though the average is smaller in case of more threads. I wasn't running any other applications.

Can anyone explain why the running time varied too much?

How to calculate the run time accurately? The lecturer has given the running time of his computer which seems consistent. And he was also using Dual Core processor.

What compile options did you use? Didi you compile with `-O3`? — Z boson, Apr 18 '18 at 06:23
You would require an affinity setting such as OMP_NUM_THREADS=2 OMP_PLACES=cores (or proprietary equivalent, if using an old OpenMP). Otherwise you don't maintain an even distribution of threads across cores. By a possibly oversimplified analysis, running 2 threads placed randomly on 2 cores, without a smart OS scheduler, you will have both threads on one core 50% of the time and expect ideally 33% parallel speedup. It won't be consistent run to run. — tim18, Apr 18 '18 at 12:48
@tim18 Thank you for the insight. I am going to pick up the ideas soon. — Shakil Ahamed, Apr 18 '18 at 19:08

user3666197 · Accepted Answer · 2018-04-18T11:11:02.593

Dual-CPU comparison, using OpenMP :

Result          : 3.1415926536
Number of CPU-s : 2  
Duration        : 2.4025482161

There seems to be pretty consistent set of resulting code-execution times:

/*           Duration        : 2.3984972970
             Duration        : 2.4004815188
             Duration        : 2.3814983589
             Duration        : 2.4070654172
             Duration        : 2.3964317020
             Duration        : 2.3858104548
             Duration        : 2.3765923560
             Duration        : 2.3734730321
    -O3:
             Duration        : 0.4159400249
             Duration        : 0.3089567909
             Duration        : 0.3106977220
             Duration        : 0.3312316008
             Duration        : 0.2856188160
             Duration        : 0.2984415500
             Duration        : 0.3282426349
             Duration        : 0.2836121118
                                    :......
  + FYI:     #pragma-overheads      :......
             Duration        : 0.0001377461                                                                                           
             Duration        : 0.0001228561
             Duration        : 0.0001215260
    REF:
    Amdahl's Law             >>> https://stackoverflow.com/revisions/18374629/3
    criticism,
    on
    (not-)including also the real-world's infrastructure add-on
    { setup | termination }-overhead costs of #pragma omp parallel section
    ( 
      simplified test w/o the add-on costs of global OpenMP setup & configuration
      )

             */

which turns attention to your System-under-Test workload background noise.

Best re-test your code on a head-less platform, so as to avoid any sort of GUI-related workloads from intervening the computing-part of the test.

May enjoy this sandboxed online-TiO-platform to re-run experiments.

OpenMP - Random running time - why having so high run-time variance?

1 Answers1

Dual-CPU comparison, using OpenMP :