1

I need to build a portable shared object, which is a plugin for another software on Linux. I did some amount of reading on the subject, came down to the conclusion, that I should build a sysrooted gcc (gcc 5.4.0 if it matters) with a decently old glibc (to provide compatibility with older systems), link with -static-libstdc++ and -static-libgcc thus arriving to a point where I have something that only depends on the hosts glibc and some other minor stuff which will always be present.

Now, I did all that and now I am experiencing a weird crash - segmentation fault happens in a place where the code calls std::thread, and gdb actually shows that the stack frame is inside libstdc++.so.6 (where is shouldn't be, ldd of my shared object also does not list libstdc++.so). The top of the stack at the crash is:

#0  0x0000000000000000 in ?? ()
#1  0x00007ffff79075e3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 # THIS SHOULD NOT BE HERE RIGHT?
#2  0x00007ffff5a25a5c in std::thread::thread<void (ReferenceAnalytics::*)(std::timed_mutex&), ReferenceAnalytics*&, std::reference_wrapper<std::timed_mutex> >
    (this=0x7fffffffcf40, __f=
    @0x7fffffffcf60: (void (ReferenceAnalytics::*)(ReferenceAnalytics * const, std::timed_mutex &)) 0x7ffff5a1750c <ReferenceAnalytics::WorkerThreadMethod(std::timed_mutex&)>)
    at /home/developer/Toolchains/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/include/c++/5.4.0/thread:137 # Looks like my toolchain

So, I did some reading, and then using nm discovered that my shared object has all the std::thread stuff like ctor, dtor, swap, .... defined as weak symbols (which I assume causes a collision if the host that loads the plugin uses dynamic libstdc++ and then my calls are routed there and all hell breaks loose, is this right?).

My further attempts of googling and reading did not give me an answer to how can I control this as in force the std::thread stuff to be resolved to the static libstdc++ in my sysrooted gcc?

More over, I made a small executable that just does dlopen on my shared object and then calls a method which internally constructs the thread - if the executable is also built with -static-libstdc++ all is well, if not, the crash happens. So I assume my theory about the weak symbol for std::thread being resolved to the hosts libstdc++ is correct, but how to solve this?

Rudolfs Bundulis
  • 11,636
  • 6
  • 33
  • 71

1 Answers1

1

If you statically link a DSO against libstdc++ without hiding the libstdc++ symbols, and the main program is linked against libstdc++ as well, then the symbol definitions in the main program will interpose/preempt the definitions in the DSO when it is opened with dlopen.

However, because the main program is not linked against libpthread, the the system libstdc++ DSO in the process image saw that the libpthread symbols were unavailable (null), and thus disabled thread support. However, your DSO needs thread support, but can't get it from the system libstdc++.

As an immediate workaround, you can hide all the statically linked libstdc++ symbols in the DSO. Then no interposition will take place, and your DSO will actually use the libstdc++ copy in the DSO itself, which has already established that there should not be any thread support in the process.

But this will likely not solve all of your problems because late loading of libpthread via dlopen has its problems. We fixed one bug here:

But your distribution may not have that fix, and I expect there will be other issues, one of them being: The second, statically linked copy of libstdc++ is actually needed here because the system libstdc++ has been loaded without thread support (because libpthread was not loaded when its symbols were bound, causing the crash you observed), so you cannot use it for creating threads. It also has activated optimizations which make the library not thread safe (avoid atomic instructions and things like that).

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • Ok, thanks, this makes a lot of sense, I'll try that out. Still at least an upvote. Am I being too paranoid and maybe I should just drop the idea to use static libstdc++? Is stdlibc++ backwards compatible, and using a toolchain with a lets say gcc 5.4.0 work for modern distros? – Rudolfs Bundulis Jul 25 '18 at 20:49
  • I updated the answer to mention that you actually need the second copy of libstdc++ because of the delayed loading of libpthread. – Florian Weimer Jul 25 '18 at 20:58
  • GCC 5 is too recent for wide distribution support, and libdstdc++ is backwards-compatible, but not forwards-compatible, so you need to build on the oldest supported distribution. Some toolchains use a hybrid linkage model where only newer libstdc++ bits are linked statically, so that applications run with the old, unchanged system libstdc++ from the system compiler. However, the harder (read: as of yet unsolved) problem is the `dlopen` of libpthread in a process which previous did not load libpthread, and an expectation that `pthread_create` will work. – Florian Weimer Jul 25 '18 at 21:01
  • Many thanks for sheding light on the internals of this:) Now at least I feel I understand what is going on. – Rudolfs Bundulis Jul 25 '18 at 21:03
  • Thank you very much, stuff seems to work now. One more thing, I implemented the hiding by using a linker script, but many articles I read, said that `-fvisibility=hidden` is superior. But if I understand correctly, `-fvisibility=hidden` (which I am already using when compiling my sources) does not control visibility of symbols that come from archive (like in this case `libstdc++.a`) and thus linker script is the only way here right? – Rudolfs Bundulis Jul 26 '18 at 08:12