OpenMP, dependency graph

Question

I took some of my old OpenMP exercises to practice a little bit, but I have difficulties to find the solution for on in particular.

The goal is to write the most simple OpenMP code that correspond to the dependency graph.

The graphs are visible here: https://i.stack.imgur.com/MAzu1.jpg

First one is simple.

It correspond to the following code:

#pragma omp parallel
{
#pragma omp simple
  {
#pragma omp task
    {
       A1();
       A2();
    }
#pragma omp task
    {
       B1();
       B2();
    }
#pragma omp task
    {
       C1();
       C2();
    }
  }
}

Second one is still easy.

#pragma omp parallel
{
#pragma omp simple
  {
#pragma omp task
    {
       A1();
    }
#pragma omp task
    {
       B1();
    }
#pragma omp task
    {
       C1();
    }
#pragma omp barrier
    A2();
    B2();
    C2();
  }
}

And now comes the last one… which is bugging me quite a bit because the number of dependencies is unequal across all function calls. I thought there was a to explicitly state which task you should be waiting for, but I can't find what I'm looking for in the OpenMP documentation.

If anyone have an explanation for this question, I will be very grateful because I've been thinking about it for more than a month now.

score 0 · Answer 1 · answered Jun 30 '17 at 07:55

First of all there is no #pragma omp simple in the OpenMP 4.5 specification. I assume you meant #pragma omp single.

If so pragma omp barrier is a bad idea inside a single region, since only one thread will execude the code and waits for all other threads, which do not execute the region.

Additionally in the second on A2,B2 and C2 are not executed in parallel as tasks anymore.

To your acutual question: What you are looking for seems to be the depend clause for Task constructs at OpenMP Secification pg. 169.

There is a pretty good explaination of the depend clause and how it works by Massimiliano for this question.

score 0 · Answer 2 · answered Jun 30 '17 at 08:35

The last example is not that complex once you understand what is going on there: each task Tn depends on the previous iteration T-1_n AND its neighbors (T-1_n-1 and T-1_n+1). This pattern is known as Jacobi stencil. It is very common in partial differential equation solvers.

As Henkersmann said, the easiest option is using OpenMP Task's depend clause:

int val_a[N], val_b[N];    
#pragma omp parallel
#pragma omp single
{
int *a = val_a;
int *b = val_b;
for( int t = 0; t < T; ++t ) {
  // Unroll the inner loop for the boundary cases
  #pragma omp task depend(in:a[0], a[1]) depend(out:b[0])
  stencil(b, a, i);

  for( int i = 1; i < N-1; ++i ) {
     #pragma omp task depend(in:a[i-1],a[i],a[i+1]) \
                 depend(out:b[i])
     stencil(b, a, i);
  }

  #pragma omp task depend(in:a[N-2],a[N-1]) depend(out:b[N-1])
  stencil(b, a, N-1);

  // Swap the pointers for the next iteration
  int *tmp = a;
  a = b;
  b = tmp;
}
#pragma omp taskwait
}

As you may see, OpenMP task dependences are point-to-point, that means you can not express them in terms of array regions.

Another option, a bit cleaner for this specific case, is to enforce the dependences indirectly, using a barrier:

int a[N], b[N];
#pragma omp parallel
for( int t = 0; t < T; ++t ) {
  #pragma omp for
  for( int i = 0; i < N-1; ++i ) {
     stencil(b, a, i);
  }
}

This second case performs a synchronization barrier every time the inner loop finishes. The synchronization granularity is coarser, in the sense that you have only 1 synchronization point for each outer loop iteration. However, if stencil function is long and unbalanced, it is probably worth using tasks.

OpenMP, dependency graph

2 Answers2