2

I'm developing some parallel C++ simulation code which I want to vectorise as effectively as possible. This is why I use both template parameters and OpenMP SIMD directives:

  • Template parameters are here to resolve some of the conditions that could occur inside the most critical loops, by resolving them at compilation time and removing the corresponding branching altogether.
  • OpenMP SIMD directives force the compiler to generate vectorised code.

A (stupid) example of what I mean could be as follow:

template< bool checkNeeded >
int ratio( double *res, double *num, double *denom, int n ) {
    #pragma omp simd
    for ( int i = 0; i < n; i++ ) {
        if ( checkNeeded ) { // dead code removed by the compiler when template is false
            if ( denom == 0 ) {
                std::cout << "Houston, we've got a problem\n";
                return i;
            }
        }
        res[i] = num[i] / denom[i];
    }
    return n;
}

Globally, it works great but the trouble I have with that is that in the (very rare) cases where I want to use the ratio<true>() version of the code, this one has been vectorised by the compiler because of the #pragma omp simd directive, which, due to the tests, printing and early exits from the loop, is way slower than the non-vectorised version... So what I'd need would be adding an if clause to my simd directive, instructing the compiler when to obey to the directive. That would give something like this:

#pragma omp simd if( checkNeeded == false )

Unfortunately, although such if clauses are supported for numerous OpenMP directives, it is not for the simd one... I don't think my request is completely stupid so I wonder why is it so, and whether it is likely to be supported in the future. Anybody knows about that?

Gilles
  • 9,269
  • 4
  • 34
  • 53
  • Macro, good old macro – user3528438 Aug 18 '15 at 15:07
  • And why not explicitly specialize the code into two different versions? – user3528438 Aug 18 '15 at 15:10
  • That is certainly a possibility. However I don't like too much macros since I find they to damage the code's readability. Indeed, with `#if` kind of approach, you are not too sure on what the compiler actually compiles. But true, macros should permit to address the issue in most cases, so I'll consider that. As for – Gilles Aug 18 '15 at 16:17
  • How often do you do the check in the real code? Is it done once or several times? – jefflarkin Aug 18 '15 at 19:52
  • In the real code, only the version with `checkNeeded` set to `false` is called for a normal run. Only if something bad happens can we decide to rerun the specific test case with the version where `checkNeeded` is set to `true`, thanks to a command line option. So this is basically a post-mortem debug option. Simply, if the initial run crashed after a few hours, we don't want the post-mortem to last for days before reaching the point where the problem occurred, hence the need of the loop not to be vectorised in this case. – Gilles Aug 18 '15 at 21:25

3 Answers3

2

I don't think my request is completely stupid so I wonder why it is so, and whether it is likely to be supported in the future. Anybody knows about that?

The SIMD directives affect code-generation at compile time, whereas the "if" clause on other OpenMP constructs implements a run-time test. (The "if" condition is not a compile-time constant). To implement an "if" on the SIMD clause would, in general, therefore require the compiler to clone the loop body and generate two distinct versions, then choose which to execute dynamically at runtime.

That seems a lot of effort for a very rare case, so I doubt that it will make it into the standard. (And, in any case, at this point the first standard you could look for it to be in won't be out for a few years, so you likely need a more pragmatic fix :-))

Jim Cownie
  • 2,409
  • 1
  • 11
  • 20
  • Ho yes, that's for sure. But indeed, the idea was to have several (actually two here) versions of the functions, depending on the template parameter. I could probably get it with some convoluted macro, but that'd mean put the entire code into the macro since it would have to be copied in different places... Anyway, if I want something working, I need as you said a pragmatic approach, even if it feels a bit frustrating. That said, Jim, since you're in the OpenMP committee, do you think requesting such a feature would make sense? Should I make a comment on the 4.1 draft? – Gilles Aug 19 '15 at 09:09
  • Sure, by all means send in a comment on the 4.1 draft. One thing you can be sure of is that if you don't ask you don't get! – Jim Cownie Aug 20 '15 at 08:26
1

Expanding upon user3528438's comments, this is probably one of the most logical places to split your function into two different functions. One handles the false case and is written as you have and the other handles the true case and does not have the simd command.

Alternatively, if you insist on using one function, you could very easily write

template< bool checkNeeded >
int ratio( double *res, double *num, double *denom, int n ) {
    if (!checkNeeded) {
        #pragma omp simd
        for ( int i = 0; i < n; i++ ) {
            res[i] = num[i] / denom[i];
        }
        return n;
    } else {
        for ( int i = 0; i < n; i++ ) {
            if ( denom == 0 ) {
                std::cout << "Houston, we've got a problem\n";
                return i;
            }
            res[i] = num[i] / denom[i];
        }
        return n;
    }
}

This will be slightly slower than your initial function in the false case because there is one if statement to evaluate (not a big factor assuming n is large [even greater than 10 and you shouldn't notice a slowdown]). Furthermore, it will be much faster in the true case because you do not have to evaluate the first if statement every iteration.

NoseKnowsAll
  • 4,593
  • 2
  • 23
  • 44
  • 100% agreed on the principle; however, in the real function, the compute part is way more complex and the test part very small in comparison. That's why writing the core of the compute loop twice is just annoying and potentially error prone when the algorithm is modified. – Gilles Aug 18 '15 at 16:29
  • Then a macro does seem like the way to go, although I understand your reservation in wanting to use them. However, you can't eat your cake and have it too in this situation. – NoseKnowsAll Aug 18 '15 at 16:35
  • Well, in regard to the cake, having the `if` clause available on the OpenMP `simd` directive would just permit it ;) – Gilles Aug 18 '15 at 17:35
  • I don't think having the `if` clause would make a difference. You can fake the `if` clause by setting the `safelen` to 1 when you don't want the SIMD directive and `n` when you do (assuming you're actually safe for *all* iterations). But then you're still counting on the compiler to eliminate the dead code in the `if` statements so that the code actually can be vectorized. You're probably better off hoisting the check into a separate loop so that the function exits before the SIMD loop if the check fails or doesn't execute the test if it's not needed. – jefflarkin Aug 18 '15 at 19:37
  • Thanks for the ideas. I'll keep them in mind and will experiment a bit on that. – Gilles Aug 18 '15 at 21:28
0
template< bool checkNeeded >
int ratio( double *res, double *num, double *denom, int n );// c++ declaration

template<>int ratio<true>( double *res, double *num, double *denom, int n ) 
{

    for ( int i = 0; i < n; i++ ) {
        res[i] = num[i] / denom[i];
    }
    return n;
}

template<>int ratio<false>( double *res, double *num, double *denom, int n ) {
    for ( int i = 0; i < n; i++ ) {
        if ( denom == 0 ) {
            std::cout << "Houston, we've got a problem\n";
            return i;
        }
        res[i] = num[i] / denom[i];
        }
    return n;
}

How do I explicitly instantiate a template function?

Community
  • 1
  • 1
user3528438
  • 2,737
  • 2
  • 23
  • 42
  • Thank you for this, but it doesn't address the point that at the end, the core of the function is copied twice, and needs therefore to be maintained twice. This actually just render the use of template pointless. Aside from that, any thoughts on the initial question about the lack of `if` clause in an OpenMP `simd` directive? – Gilles Aug 18 '15 at 17:31