0

I am learning how to use OpenMP in C program. I noticed that "#pragma omp atomic" will increase the runtime even if the number of threads is 1 while updating a 1d array. Here is my code:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <mpi.h>
#include <omp.h>

double fixwork(int a, int n) //n==L
{
    int j;
    double s, x, y;
    double t = 0;
    for (j = 0; j < n; j++)
    {
        s = 1.0 * j * a;
        x = (1.0 - cos(s)) / 2.0;
        y = 0.31415926 * x; 
        t += y;
    }

    return t;
}

int main(int argc, char* argv[])
{
    int n = 100000;
    int p = 1;
    int L = 2;
    int q = 100;
    int g = 7;
    int i, j, k;
    double v;

    int np, rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    double* u = (double*)calloc(n * g, sizeof(double));
    double* w = (double*)calloc(n * g, sizeof(double));
    
    double omptime1 = -MPI_Wtime();
#pragma omp parallel for private(k, j, v) num_threads(p)
    for (i = 0; i < n; i++)
    {
        k = i * (int)ceil(1.0 * (i % q) / q);
        for (j = 0; j < g; j++)
        {
            v = fixwork(i * g + j, L);
#pragma omp atomic 
            u[k] += v;
        }
    }
    omptime1 += MPI_Wtime();
    
    printf("\npragma time = %f", omptime1);
    MPI_Finalize();
    return 0;
}

I complied this code by:

mpiicc -qopenmp atomictest.c -o atomic

With 1 openmp thread and 1 mpi process, the observed ratio of time(use atomic)/time(no atomic) is ~ 1.28 (n=1e6), ~1.07 (n=1e7), and even larger for smaller n. It says the atomic directive itself has cost more time to operate. What is the reason for such performance? What is the difference between the machine operations of "omp atomic" and "c++ atomic"? Thanks

STao34
  • 1
  • Comparing execution times of unoptimized code is not worth the time. What differences do you get when optimizing (with `-O3`)? Do you get different performance using `std::atomic` (since you mention it in your question, although you've tagged this as C)? – 1201ProgramAlarm Jun 09 '21 at 16:48
  • As a matter fo general style, please, please, please, declare your variables in their minimal scope. That has been legal in C since at least C99 (so for over 20 years). It makes it easier for other people (and the compiler) to understand the code, and often avoids bugs when you parallelise with OpenMP, since you don't need to add `private` declarations for variables which are declared inside what becomes the parallel region. – Jim Cownie Jun 10 '21 at 08:18

1 Answers1

1

It is partially answered here:

If you enable OpenMP, gcc has to generate different code that works for any number of threads that is only known at runtime..... The compiler has to use different atomic instructions that are likely more costly...

dreamcrash
  • 47,137
  • 25
  • 94
  • 117
Laci
  • 2,738
  • 1
  • 13
  • 22