5

Code 1 shows the parallelization of 'for' loop using openmp. I would like to achieve similar parallelization after unrolling the 'for' loops using template metaprogramming (refer Code 2). Could you please help?

Code 1: Outer for loop run in parallel with four threads

void some_algorithm()
{
  // code
}

int main()
{
  #pragma omp parallel for
  for (int i=0; i<4; i++)
  {
    //some code
    for (int j=0;j<10;j++)
    {
      some_algorithm()
    }
  }
}

Code 2: Same as Code 1, I want to run outer for loop in parallel using openmp. How to do that?1

template <int I, int ...N>
struct Looper{
    template <typename F, typename ...X>
    constexpr void operator()(F& f, X... x) {
        for (int i = 0; i < I; ++i) {
            Looper<N...>()(f, x..., i);
        }
    }
};

template <int I>
struct Looper<I>{
    template <typename F, typename ...X>
    constexpr void operator()(F& f, X... x) {
        for (int i = 0; i < I; ++i) {
            f(x..., i);
        }
    }
};


int main()
{
    Looper<4, 10>()(some_algorithm); 
}

1Thanks to Nim for code 2 How to generate nested loops at compile time?

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
coder
  • 123
  • 4
  • Maybe OpenMP is overkill for such small loops. Only loops with a reasonably small number of iterations can be unrolled. – πάντα ῥεῖ Oct 04 '20 at 12:32
  • OpenMP manages a thread pool in background. It's the first thing you should think about when doing parallel work without OpenMP. – rustyx Oct 04 '20 at 14:21
  • @πάντα ῥεῖ, I agree, however, in my case, if some_algorithm has very complex logic, so I prefer to run each outer loop i in a separate thread. Basically, my question is - is it possible to combine openmp with template programming? – coder Oct 04 '20 at 17:35

1 Answers1

1

If you remove the constexpr declarations, then you can use _Pragma("omp parallel for"), something like this

#include <omp.h>

template <int I, int ...N>
struct Looper{
    template <typename F, typename ...X>
    void operator()(F& f, X... x) {
        _Pragma("omp parallel for if (!omp_in_parallel())")
        for (int i = 0; i < I; ++i) {
            Looper<N...>()(f, x..., i);
        }
    }
};

template <int I>
struct Looper<I>{
    template <typename F, typename ...X>
    void operator()(F& f, X... x) {
        for (int i = 0; i < I; ++i) {
            f(x..., i);
        }
    }
};

void some_algorithm(...) {
}
int main()
{
    Looper<4, 10>()(some_algorithm); 
}

Which you can see being compiled to use OpenMP at https://godbolt.org/z/nPrcWP (observe the call to GOMP_parallel...). The code also compiles with LLVM (switch the compiler to see :-)).

Jim Cownie
  • 2,409
  • 1
  • 11
  • 20