2

I implemented 2 versions of of the pi approximation. I tested it and noticed that one version is much faster but i don't really understand why. In the first version i created an array of the size of defined number of processes and updating the indexes, in the second version i used just reduction.

first version:

#pragma omp parallel private(x) shared(sum_vector)
    {
        int tid = omp_get_thread_num();
        for (int i = tid; i < num_steps; i += threads_number){
            x = (i+0.5)*step;
            sum_vector[tid] += 4.0/(1.0+x*x);
        }
    }

second version:

#pragma omp parallel reduction(+:sum) private(x)
{
    int nthreads = omp_get_num_threads();
    int id = omp_get_thread_num();
    for (int i = id; i < num_steps; i += nthreads){
        x = (i+0.5)*step;
        sum = sum + 4.0/(1.0+x*x);
    }

}

The second version is almost twice as fast for 1 Million iterations or higher.

I would appreciate every answer! Thank you in advance!

whoami1996
  • 25
  • 6

0 Answers0