Why is it possible to override symbols from some static libraries but not others?

Question

I'm working on a tool that links against Clang, and I need to implement a small number of changes to some operations. To improve development times, instead of rebuilding Clang, I decided to redefine the symbols of interest in my program code, and let the linker take care of the rest: in most cases, the program version of a symbol that is defined in both program code and a static library takes precedence at link-time without a fuss. (The linked answer relates to Linux, but I found that to work on macOS too–usually.)

This worked great when I was using the stock Clang build for macOS that can be downloaded from the LLVM website. However, I am currently trying to switch to my company's customized Clang (which I built once from source, and hoped to further modify in the same way), and now I get duplicate symbol errors.

I don't know what is causing this issue. My project's linker flags have remained unchanged (save for one new static library): importantly, they do not contain -all_load or its -force_load cousin, which tell the linker to try to include every symbol defined in static libraries. The symbols that I'm trying to override look defined the same way when I check them with nm in the stock archive and in the custom archive. The difference has to be with how I built LLVM, but just knowing that doesn't really help me figure out what I need to change.

For instance, say that I want to redefine clang::Qualifiers::getAsString() const. I could do that just fine using the stock LLVM libraries, but now I would get a duplicate symbol error:

duplicate symbol __ZNK5clang10Qualifiers11getAsStringEv in:
  .../Objects-normal/x86_64/TypePrinter.o
  clang+llvm-internal/lib/libclangAST.a(TypePrinter.cpp.o)

Using nm -f darwin to inspect both archives, I would get very similar results for __ZNK5clang10Qualifiers11getAsStringEv:

# clang+llvm-6.0.0/lib/libclangAST.a
                 (undefined) external __ZNK5clang10Qualifiers11getAsStringEv
0000000000000bb0 (__TEXT,__text) external __ZNK5clang10Qualifiers11getAsStringEv


# clang+llvm-internal/lib/libclangAST.a
                 (undefined) external __ZNK5clang10Qualifiers11getAsStringEv
0000000000000d00 (__TEXT,__text) external __ZNK5clang10Qualifiers11getAsStringEv

So, assuming more or less identical symbol definitions, and identical linker flags, why was I able to override static library symbols this way before, and why am I no longer able to?

zneak · Accepted Answer · 2018-08-14T20:30:32.480

This part of the premise isn't quite correct:

In most cases, the program version of a symbol that is defined in both program code and a static library takes precedence at link-time without a fuss. (The linked answer relates to Linux, but I found that to work on macOS too–usually.)

The linked answer appears correct, but I originally misunderstood it. The actual behavior, as evidenced by passing -Wl,-why_load to Clang (or -why_load to the linker), goes as follow:

if a symbol is referenced, try to find its definition in the program code.
- if it is defined in the program code, you're done; do not search static libraries.
if it is not defined in the program code, look up the .__SYMDEF file in the static library to know which object file has it.
use all the definitions from that object file.

The issue was that switching to the custom Clang, I accidentally pulled in references to symbols that were defined in the same object file as symbols that I am redefining, causing the linker to see both definitions. I was able to solve the problem by using the -why_load argument to the linker, and then looking for which symbol caused the problem object file to be loaded. I then duplicated the definition of that symbol to my program, and now the linker doesn't complain anymore.

The morale of the story is that this technique isn't as reliable on macOS as it is on Linux, and that if you do it, you kind of have to go all in. It's better to take the entire source file and copy it to your project than to try to pick symbols piecewise.

ead · Answer 2 · 2018-08-15T06:35:48.343

Actually this behavior is the same for Linux, see this reproducer:

First case: build library where symbols are in different object files:

//val.cpp  - contains needed symbol
int val=42;

//wrong_main.cpp - contains duplicate symbol
int main(){
   return 21;
}

>>> g++ -c val.cpp -o val.o
>>> g++ -c wrong_main.cpp -o wrong.o
>>> ar rcs libsingle.a val.o wrong.o

Linking against this library works, no multiple definition of main-error is issued, because no symbols at all are used from the object file wrong_main.o at all:

//main.cpp
extern int val;
int main(){
   return val;
} 

>>> g++ main.cpp -L. -lsingle -o works

Second case: both symbols are in the same object file:

//together.cpp  - contains both, needed and duplicate, symbols
#include "val.cpp"
#include "wrong_main.cpp"

>>> g++ -c together.cpp -o together.o
>>> ar rcs libtogether.a all.o

Linking against libtogether.a doesn't work:

>>> g++ main.cpp -L. -ltogether -o doesntwork
./libtogether.a(all.o): In function `main':
all.cpp:(.text+0x0): multiple definition of `main'
/tmp/cc38isDb.o:main.cpp:(.text+0x0): first defined here
collect2: ld returned 1 exit status

The linker takes either the whole object file from a static library or nothing. In this case val is needed and so the object file together.o will be taken, but it also contains the duplicate symbol main and thus the linker issues an error.

A great description how the linker works on Linux (and very very similar on MacOS) is this article.

Why is it possible to override symbols from some static libraries but not others?

2 Answers2