10

What is the proper way to parallelize a multi-dimensional embarrassingly parallel loop in OpenMP? The number of dimensions is known at compile-time, but which dimensions will be large is not. Any of them may be one, two, or a million. Surely I don't want N omp parallel's for an N-dimensional loop...

Thoughts:

  • The problem is conceptually simple. Only the outermost 'large' loop needs to be parallelized, but the loop dimensions are unknown at compile-time and may change.

  • Will dynamically setting omp_set_num_threads(1) and #pragma omp for schedule(static, huge_number) make certain loop parallelizations a no-op? Will this have undesired side-effects/overhead? Feels like a kludge.

  • The OpenMP Specification (2.10, A.38, A.39) tells the difference between conforming and non-conforming nested parallelism, but doesn't suggest the best approach to this problem.

  • Re-ordering the loops is possible but may result in a lot of cache-misses. Unrolling is possible but non-trivial. Is there another way?

Here's what I'd like to parallelize:

for(i0=0; i0<n[0]; i0++) {
  for(i1=0; i1<n[1]; i1++) {
    ...
       for(iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

Thanks!

  • 1
    +1 for a well presented question – pmg Mar 13 '11 at 12:38
  • 1
    Getting the right answer is all about asking the right question. 'Course it doesn't hurt to reference the spec too. :) –  Mar 13 '11 at 16:48

1 Answers1

9

The collapse directive is probably what you're looking for, as described here. This will essentially form a single loop, which is then parallized, and is designed for exactly these sorts of situations. So you'd do:

#pragma omp parallel for collapse(N)
for(int i0=0; i0<n[0]; i0++) {
  for(int i1=0; i1<n[1]; i1++) {
    ...
       for(int iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

and be all set.

Jonathan Dursi
  • 50,107
  • 9
  • 127
  • 158
  • Thanks! Dang, that's easy. I saw that, thought it wouldn't work for some reason, then forgot about it. Yup. Looks to be just right. And looks like nesting it with `#omp parallel{ #omp for collapse{ #omp parallel{ #omp for collapse{ ... } } } }` is valid. Not that it's a good idea, but it's for function evaluation on large datasets so f(g(x)) should be perfectly valid. Anyway, thanks! –  Mar 13 '11 at 13:40
  • 2
    Two things to note though. First, the collapse clause is only in OpenMP V3.0 and above. Second, while you don't have to specifically make the loop iteration variables private when using the collapse clause, if you remove the collapse clause, then you had better either declare them as above (using C99 syntax) or put them in a private clause. Otherwise they will be shared and you will have a problem. – ejd Mar 13 '11 at 17:19
  • 1
    Using gcc 4.4.4 which implements 3.0. Thanks for reminding me to check. And I like `#pragma omp parallel default(none)`, just so you don't get careless. Also FYI, turns out the index can't be an array element, i.e. `int i[N]`. Compiler error. –  Mar 13 '11 at 17:32