openmp recursive tasks example slower than sequential

Question

I've just started looking into OpenMP and have been reading up on tasks. It seems that Sun's example here is actually slower than the sequential version. I believe this has to do with the overhead of task creation and management. Is this correct? If so, is there a way to make the code faster using tasks without changing the algorithm?

int fib(int n)
{
  int i, j;
  if (n<2)
    return n;
  else
  {
    #pragma omp task shared(i) firstprivate(n)
    i=fib(n-1);

    #pragma omp task shared(j) firstprivate(n)
    j=fib(n-2);

    #pragma omp taskwait
    return i+j;
  }
}

score 6 · Accepted Answer · answered Feb 07 '13 at 11:38

The only sensible way is to cut the parallelism at a certain level, below which it makes no sense as the overhead becomes larger than the work being done. The best way to do it is to have two separate fib implementations - a serial one, let's say fib_ser, and a parallel one, fib - and to switch between them under a given threshold.

int fib_ser(int n)
{
  if (n < 2)
    return n;
  else
    return fib_ser(n-1) + fib_ser(n-2);
}

int fib(int n)
{
  int i, j;

  if (n <= 20)
    return fib_ser(n);
  else
  {
    #pragma omp task shared(i)
    i = fib(n-1);
    #pragma omp task shared(j)
    j = fib(n-2);
    #pragma omp taskwait
    return i+j;
  }
}

Here the threshold is n == 20. It was chosen arbitrarily and its optimal value would be different on different machines and different OpenMP runtimes.

Another option would be to dynamically control the tasking with the if clause:

int fib(int n)
{
  int i, j;
  if (n<2)
    return n;
  else
  {
    #pragma omp task shared(i) if(n > 20)
    i=fib(n-1);

    #pragma omp task shared(j) if(n > 20)
    j=fib(n-2);

    #pragma omp taskwait
    return i+j;
  }
}

This would switch off the two explicit tasks if n <= 20, and the code would execute in serial, but there would still be some overhead from the OpenMP code transformation, so it would run slower than the previous version with the separate serial implementation.

score 3 · Answer 2 · answered Feb 12 '13 at 23:08

You are correct, the slow down has to do with the overhead of task creation and management.

Adding the if(n > 20) is a great way to prune the task tree, and even more optimization can be made by not creating the second task. When the new tasks get spawned, the parent thread does nothing. The code can be sped up by allowing the parent task to take care of one of the fib calls instead of creating more tasks:

int fib(int n){
  int i, j;
  if (n < 2)
    return n;
  else{
    #pragma omp task shared(i) if(n > 20)
        i=fib(n-1);

    j=fib(n-2);

    #pragma omp taskwait
    return i+j;
  }
}

openmp recursive tasks example slower than sequential

2 Answers2

Linked