14

My team recently upgraded from the 2015 Intel Compiler (parallel studio) to the 2018 version, and we're having a linker issue that has everyone tearing their hair out.

I have the following class (moderately redacted for brevity) for handling wrapping of sub processes and associated file descriptors for talking to them:

class SubprocWrapper
{
public:
    static const int PASSTHRU_FD = 0;
    static const int MAKE_PIPE = -1;

    typedef std::map<std::string, std::string> EnvMapType;

    static EnvMapType getMyEnv();

    SubprocWrapper(
        int stdin_fd_req,
        int stdout_fd_req,
        int stderr_fd_req,
        const std::string & execPath,
        const std::vector<std::string> & args,
        const std::set<int> & dont_close_fds,
        const EnvMapType * env = 0);
};

I am then invoking it with the following code:

std::string runCmd = "/run/some/file.bin";
std::vector<std::string> args(2);
args[0] = "-c";
args[1] = runCmd;

SubprocWrapper::EnvMapType env_vars = SubprocWrapper::getMyEnv();

SubprocWrapper subproc(
    SubprocWrapper::PASSTHRU_FD,
    SubprocWrapper::PASSTHRU_FD,
    SubprocWrapper::PASSTHRU_FD,
    std::string("/bin/sh"),
    args,
    std::set<int>(), //dont_close_fds = null means "close all fds"
    &env_vars
);

On both 2015 and 2018 Intel compilers, the above code compiles just fine.

However, in the 2018 Intel compiler, the above code fails to link, whereas in the 2015 Intel compiler, it links just fine.

The error seems to be that the linker is unable to find the constructor symbol, as I get the following error(s):

    SourceFile.o: in function <MangledName> SourceFile.hh:<LineNum>: undefined reference to 
`SubprocWrapper::SubprocWrapper(int, int, int, std::string const&,
 std::vector<std::string, std::allocator<std::string> > const&,
 std::set<int, std::less<int>, std::allocator<int> > const&,
 std::map<std::string, std::string, std::less<std::string>,
 std::allocator<std::pair<std::string const, std::string> > > const*)'

Note that the SubprocWrapper class is being compiled into a .a file and linked statically into the code that invokes it. Doing an nm on the produced .a file seems confirm a symbol for the SubprocWrapper constructor exists, but the code only links under 2015 even if the original .a was compiled with 2018. We have confirmed that the correct .a file is being passed in on the linker line (our build process has not changed), and we have tried moving the .a around in the linking order to no avail.

Doing an nm on the .a file I'm linking against shows the following signatures associated with the constructor (different between lib.a and SourceFile.o):

lib.a:

0000000000001020 T _ZN4beau5posix14SubprocWrapperC1EiiiRKSsRKSt6vectorISsSaISsEERKSt3setIiSt4lessIiESaIiEEPKSt3mapISsSsSA_ISsESaISt4pairIS2_SsEEE
0000000000000084 r _ZN4beau5posix14SubprocWrapperC1EiiiRKSsRKSt6vectorISsSaISsEERKSt3setIiSt4lessIiESaIiEEPKSt3mapISsSsSA_ISsESaISt4pairIS2_SsEEE$$LSDA
0000000000001010 T _ZN4beau5posix14SubprocWrapperC2EiiiRKSsRKSt6vectorISsSaISsEERKSt3setIiSt4lessIiESaIiEEPKSt3mapISsSsSA_ISsESaISt4pairIS2_SsEEE

SourceFile.o:

U _ZN4beau5posix14SubprocWrapperC1EiiiRKSsRKSt6vectorISsSaISsEERKSt3setIiSt4lessIiESaIiEEPKSt3mapISsSsSA_ISsESaISt4pairIKSsSsEEE

Note that the two mangled names for the constructor don't match!

I assume the code is standards compliant since it compiles fine, it just fails to link. The code hasn't changed in probably 7 years or so, and all previous versions of the Intel compiler have worked with it just fine.

Why does this code work under Intel 2015 but not 2018?

Operating system: RHEL 7.4
GCC version: 4.8.5
libstdc++ version: 4.8.5

Intel compiler versions: 
2015.3.187 (Works!)
2018.1.163 (Fails to link!)

Edit: This problem seems to be a race condition of some kind. We also noticed it on the Intel 2015 compiler when compiling with a very large job batch (make -j30). The problem cropped up, and upon further inspection, we found that indeed, one of the .o files was compiled with the KSs version of the symbol as opposed to the IS2 version. After a few expletives were exchanged, we removed the offending .o file and compiled again with no job batch (just make), and found, to our surprise, that the compiler generated a different symbol this time around for the function (the IS2 version). This seems extremely weird, as the compiler should always generate the same symbol, regardless of the job batch number. Unfortunately, the behavior is not readily repeatable, as we did a make clean, and ran with a high job batch number again, only to find that it worked that time around.

stix
  • 1,140
  • 13
  • 36
  • 2
    What C++ standard are you compiling against, and what version of gcc are you pulling your `libstdc++` headers from? 2018 might be mangling the name of the `std::string` class differently due to an ABI change that came around the gcc 5.x series, IIRC. – Jason R Mar 09 '18 at 17:50
  • Both compilers are using GCC 4.8.5 for their libstdc++. Will update the post to reflect this. – stix Mar 09 '18 at 17:51
  • "I assume the code is standards compliant since it compiles fine" - that's a false assumption. Different compilers often compile non-compliant code (for various reasons; they implement language extensions, they have bugs, your code contains undefined behaviour and the compiler is then allowed to do *anything*, etc.). Just because something *compiles* does *not* mean that it is *correct* or *valid* or *compliant*. – Jesper Juhl Mar 09 '18 at 17:59
  • 1
    @HansPassant Why is the linker looking for a constructor lacking the EnvMapType argument when I'm clearly trying to use a constructor that does have the EnvMapType argument? – stix Mar 09 '18 at 18:08
  • Total shot into the blue, but try defining the constructor as verbose as possible, maybe the compiler uses different allocators/comparators in declaration and invocation: `typedef std::map, std::allocator > > EnvMapType;` `SubprocWrapper(int stdin_fd_req, int stdout_fd_req, int stderr_fd_req, const std::string & execPath, const std::vector > & args, const std::set, std::allocator > & dont_close_fds, const EnvMapType * env = 0);` – Max Vollmer Mar 09 '18 at 18:09
  • @HansPassant OP said the code hasn't changed in 7 years. Also not sure where you get the idea from that the constructor is lacking the EnvMapType argument, what makes you think so? – Max Vollmer Mar 09 '18 at 18:13
  • Look at the linker error message, find EnvMapType back in that message. You have to scroll, making it extra likely to not see that it isn't there :) – Hans Passant Mar 09 '18 at 18:15
  • 1
    EnvMapType is typedef'd, and the typedef resolves to what the linker error is showing, namely a map type. – stix Mar 09 '18 at 18:16
  • @HansPassant EnvMapType is in the error message. It's the last parameter: `std::map, std::allocator > >`. – Max Vollmer Mar 09 '18 at 18:17
  • _Doing an nm on the produced .a file seems confirm a symbol for the SubprocWrapper constructor exists_ - without demangling, do you see **exactly** the same symbol defined in the .a, and undefined in `SourceFile.o`? – Useless Mar 09 '18 at 19:44
  • @Useless Yes, I have verified the symbols are in both the .a and .o file and are identical. Updated question to show this. – stix Mar 09 '18 at 20:12
  • Can you compare linker settings/parameters for both Intel versions? The gcc command lines used would shed some light on this. – Ripi2 Mar 09 '18 at 20:26
  • @Ripi2 Linker settings and parameters are unchanged between both intel versions. The only difference in our build process is sourcing environments for the 2018 compiler instead of the 2015 compiler. – stix Mar 09 '18 at 20:29
  • Perhaps some Intel-defaults have changed. That's why I suggest looking at gcc commands. – Ripi2 Mar 09 '18 at 20:30
  • @Useless Sorry, I misunderstood your question, the .a and .o files *DO NOT* match mangled names. However, when you UNMANGLE both mangled names, they match unmangled... Weird. – stix Mar 09 '18 at 20:54
  • 2
    OK, so this means the two are somehow getting mangled differently. You need to compare compiler flags when building the static lib to those used when linking – Useless Mar 09 '18 at 21:21

1 Answers1

4

Sorry, this is not actually an answer, a mere comment, but I couldn't fit it into the comment box.

If you analyze the two mangled names, you'll find that there's only one fragment where they differ: KSs vs S2_. KSs means std::string const (or, more exactly, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const). S2_ is a compressed reference to the 4th substitutable component of the name encountered earlier, which actually is the same type. Looks like the GNU mangling scheme is ambiguous in this matter, as it doesn't prescribe where compressed references are or aren't to be used and merely describes them as an available option.

However, it appears to be a bug if a compiler doesn't mangle names in a consistent fashion as it breaks linking legitimate code. You should report a bug with the compiler.

ach
  • 2,314
  • 1
  • 13
  • 23
  • 1
    I am actually afraid this is a bug too, which is somewhat ironic as we're upgrading to Intel 2018 because 2015 has a bug on RHEL 7.4... – stix Mar 12 '18 at 14:21