4

The current OpenMP standard says about the declare simd directive for C/C++:

The use of a declare simd construct on a function enables the creation of SIMD versions of the associated function that can be used to process multiple arguments from a single invocation in a SIMD loop concurrently.

More details are given in the chapter, but there seems to be no restriction there to the type of function the directive can be applied to.

So my question is, can this directive be applied safely to an inline function?

I'm asking that for two reasons:

  1. An inline function is a rather unusual function, since it is normally inlined directly in the place it was called. So it is likely never compiled as a standalone function and therefore, the declare simd aspect of it is quite redundant with the possible simd directive at the enclosing loop's level.
  2. I have a code with such inline declare simd functions, and sometimes, for some nebulous reasons, GCC complains about their multiple definition at link time (with names mangled with extra characters suggesting that these are vectorised versions). But if I remove the declare simd directive, it compiles and link fine.

So far I hadn't think too much about it, but now I'm puzzled. Is that a bug of mine (ie using declare simd for inline functions) or is that a problem in GCC generating binary vectorised versions of inline functions and failing to sort them out at link time?


EDIT:
There is a GCC compiler options which makes a difference. When the inlining is enabled (with -O3 for example), the code compiles and links fine. But when compiled with -O0 or with -O3 -fno-inline, the inlining is disabled and the linking fails with this "multiple definition of" the function decorated with the omp declare simd directive.


EDIT 2:
Thanks to @Zboson questions regarding the compiler flags, I managed to create a reproducer. Here it is:

foobar.h:

#ifndef FOOBAR_H_
#define FOOBAR_H_

#include <cmath>

#pragma omp declare simd
inline double foo( double d ) {
    return sin( cos( exp( d ) ) );
}

double bar( double *v, int len );

#endif

foobar.cc:

#include "foobar.h"

double bar( double *v, int len ) {
    double sum = 0;
    for ( int i = 0; i < len; i++ ) {
        sum += foo( v[i] );
    }
    return sum;
}

simd.cc:

#include <iostream>
#include "foobar.h"

int main() {

    const int len = 100;
    double *v = new double[len];

    for ( int i = 0; i < len; i++ ) {
        v[i] = i;
    }

    double sum = 0;
    #pragma omp simd reduction( +: sum )
    for ( int i = 0; i < len; i++ ) {
        sum += foo( v[i] );
    }

    std::cout << sum << "  " << bar( v, len ) << std::endl;

    delete[] v;

    return 0;
}

compilation:

> g++ -fopenmp -g simd.cc foobar.cc
/tmp/ccI4e7ip.o: In function `_ZGVbN2v__Z3food':
foobar.h:7: multiple definition of `_ZGVbN2v__Z3food'
/tmp/cc4U8Qyu.o:foobar.h:7: first defined here
/tmp/ccI4e7ip.o: In function `_ZGVbM2v__Z3food':
foobar.h:7: multiple definition of `_ZGVbM2v__Z3food'
/tmp/cc4U8Qyu.o:foobar.h:7: first defined here
/tmp/ccI4e7ip.o: In function `_ZGVcN4v__Z3food':
foobar.h:7: multiple definition of `_ZGVcN4v__Z3food'
/tmp/cc4U8Qyu.o:foobar.h:7: first defined here
/tmp/ccI4e7ip.o: In function `_ZGVcM4v__Z3food':
foobar.h:7: multiple definition of `_ZGVcM4v__Z3food'
foobar.h:7: first defined here
/tmp/ccI4e7ip.o: In function `_ZGVdN4v__Z3food':
foobar.h:7: multiple definition of `_ZGVdN4v__Z3food'
foobar.h:7: first defined here
/tmp/ccI4e7ip.o: In function `_ZGVdM4v__Z3food':
foobar.h:7: multiple definition of `_ZGVdM4v__Z3food'
foobar.h:7: first defined here
collect2: error: ld returned 1 exit status
> c++filt _ZGVdM4v__Z3food
_ZGVdM4v__Z3food
> c++filt _Z3food
foo(double)

Gcc versions 4.9.2 and 5.1.0 both give the very same problem, while the Intel compiler version 15.0.3 compiles it just fine.


Final edit:
Hristo Iliev's comment and Z boson's question comfort me in the idea that my code is OpenMP compliant, and that this is a bug in GCC. I'll see to make further tests with the most up-to-date version I can find, and report it if needed.

Community
  • 1
  • 1
Gilles
  • 9,269
  • 4
  • 34
  • 53
  • "So it is likely never compiled as a standalone function..." Not so sure about that. – Vladimir F Героям слава Dec 04 '15 at 16:27
  • What compile options were used when it fails? – Z boson Dec 04 '15 at 19:04
  • What happens with `-O2` instead of `-O3` or `-Ofast`? – Z boson Dec 04 '15 at 19:13
  • Hmmm. Okay, well so much for my idea but that's still interesting. Maybe it has something to do with the new SIMD math functions which are enabled with `omp simd` and `-Ofast`? – Z boson Dec 04 '15 at 19:19
  • @VladimirF Yeah, I wasn't sure either. And further tests showed that indeed, the problem comes from the function not being inlined. – Gilles Dec 04 '15 at 19:36
  • What you are thinking of is `static inline`. That does not produce a standalone function. Without `static` the compiler has to produce a stand-alone function in case another object file calls the function (i.e the function has external linkage unless you use static). I assume if there is only one object file this does not apply. – Z boson Dec 04 '15 at 20:07
  • Yep, I just tried your code. `static inline` fixes your problem. Though that does not explain why it fails without `static`. – Z boson Dec 04 '15 at 20:30
  • You example compiles fine with `-O3` (and `-O2` and `-O1` for that matter). – Z boson Dec 05 '15 at 19:19

1 Answers1

3

An inline function is a rather unusual function, since it is normally inlined directly in the place it was called. So it is likely never compiled as a standalone function.

This is incorrect. A function with or without inline unless declared static has external linkage. The compiler has to produce a stand-alone version of the function (which won't be inlined) in case the function is called from another object file. If you don't want a standalone function declare the function static. See section 8.3 und the heading "Inlined functions have a non-inlined copy" in Agner Fog's Optimizing software in C++ for more details.

Using static inline double foo does not give an error with your code.

Now let's look at the symbols. Without using static

nm foobar.o | grep foo

gives

W _Z3food
T _ZGVbM2v__Z3food
T _ZGVbN2v__Z3food
T _ZGVcM4v__Z3food
T _ZGVcN4v__Z3food
T _ZGVdM4v__Z3food
T _ZGVdN4v__Z3food

and nm foobar.o | grep foo gives the same thing.

The uppercase "W" and "T" mean the symbols are external. However "W" is a weak symbol which does not cause a link error however "T" is a strong symbol which does. So this shows why the linker is complaining.

What's the result with static inline? In this case nm foobar.o | grep foo gives

t _ZGVbM2v__ZL3food
t _ZGVbN2v__ZL3food
t _ZL3food

and nm simd.o | grep foo gives the same thing. But lowercase "t" means the symbols have local linkage and so there is no problem with the linker.

If we compile without OpenMP the only foo symbol produced is _ZL3food. I don't know why GCC is producing weak symbols for the non-SIMD version of the function and strong symbols for the SIMD version so I can't completely answer your question but I thought this information would be interesting nevertheless.

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • 1
    Inline functions have to be defined in every compilation unit, in which they are used. Since they must also be emitted out-of-line, duplicate symbols will occur with more than one compilation unit. The solution is to make the symbols weak. To me it looks like that the GCC SIMD-iser is doing a sloppy job and not marking the out-of-line vectorised versions as weak. Probably a compiler bug or an undefined behaviour. – Hristo Iliev Dec 05 '15 at 11:02
  • @HristoIliev, I agree it looks like it may be a bug. In case you are interested this question inspired [my own question](http://stackoverflow.com/questions/34110703/inlined-functions-have-a-non-inlined-copy). BTW, the OPs example code compiles fine with optimization enabled or using `static inline`. – Z boson Dec 05 '15 at 21:43
  • The optimiser is probably pruning the AST and removing all of the unused code once the function code gets inlined. – Hristo Iliev Dec 05 '15 at 21:57
  • 1
    @HristoIliev and Z boson, I think that this pretty-much answers my main question which was whether my code was standard compliant. So for me, this is a GCC bug and I'll see to report it. Thanks guys. – Gilles Dec 06 '15 at 06:53
  • 2
    @Gilles, thank you for your question! I actually cleared up some misunderstandings about in inline I had (though I still have a few things to learn about it) and now understand C++ better due to your question. – Z boson Dec 06 '15 at 09:37
  • 2
    I guess, we all did :) – Hristo Iliev Dec 06 '15 at 12:28