C and C++ static linking: just a copy?

Question

When someone statically links a .lib, will the linker copy the whole contents of lib into the final executable or just the functions used in the object files?

score 21 · Accepted Answer · answered Mar 16 '11 at 22:54

21

The whole library? -- No.
Just the functions you called? -- No.
Something else? -- Yes.

It certainly doesn't throw in the whole library.

But it doesn't necessarily include just "the functions used in the object files" either.

The linker will make a recursively built list of which object modules in the library satisfy your undefined symbols.

Then, it will include each of those object modules.

Typically, a given object module will include more than one function, and if some of these are not called by the ones that you do call, you will get some number of functions (and data objects) that you didn't need.

answered Mar 16 '11 at 22:54

DigitalRoss

143,651
25
248
329

4

+1 this answer is the most correct and highlights the pitfalls library authors often fall into that make static linking lead to massive file sizes. – R.. GitHub STOP HELPING ICE Mar 16 '11 at 23:00
I believe the best answer is *depends on the linker*. I've run across linkers that throw in the entire library because it speeds up the build process (and the linkers were cheap). I've used others that only extracted the functions out of an object module; not the entire module or library. – Thomas Matthews Mar 16 '11 at 23:03
@Thomas: it might even depend on the configuration/arguments passed to the said linkers. I would not be surprised that it throws in the whole library in debug build, but produces a tighter (stripped) version for the release one. – Matthieu M. Mar 17 '11 at 07:16
I believe that linkers typically provide optimization flags specifically to force the linker to remove dead code, since they don't do it automatically. GCC and MSVC do anyways. See my post below. – J T Mar 17 '11 at 17:54
2

@Thomas. I disagree. While it would be possible to design an object format where individual functions could be left out, all popular object formats on all common operating systems today are derived from COFF and ELF. In these formats, nothing can be removed from text or data in any one module because you don't know the location of every reference, only inter-module references can be adjusted. I've never seen a linker throw in a whole library, that would be a top-priority bug to be fixed for sure. – DigitalRoss May 02 '11 at 20:36

score 6 · Answer 2 · edited Jun 06 '16 at 10:52

The linker typically does not remove dead code before building the final executable. That is, it will (usually) link in ALL symbols whether they are used in the final executable or not. However, linkers often explicitly provide Optimization settings you can use to force the linker to try extra hard to do this.

For GCC, this is accomplished in two stages:

First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:

-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):

-Wl,--gc-sections

So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):

gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections

(Note that -Os is an additional compiler flag that tells GCC to optimize for size)

As for MSVC, function level linking accomplishes the same thing. I believe the compiler flag for this is (to sort things into sections):

/Gy

And then the linker flag (to discard unused sections):

/OPT:REF

score 4 · Answer 3 · answered Mar 16 '11 at 22:32

4

Linkers were invented in ancient times, when memory was especially precious. One of their primary functions was to prune out the modules you weren't using. That ability has been carried forward to the present day.

It's quite common for some library functions to rely on others though, and all the dependencies will be linked.

answered Mar 16 '11 at 22:32

Mark Ransom

299,747
42
398
622

Not necessarily. Because computers are faster and memory (including hard drives) are cheaper, linkers can afford to be fast and lazy and dump whole libraries into executables rather than spending time following dependency trees. Low cost linkers are more lazy. – Thomas Matthews Mar 16 '11 at 22:36
@Thomas, I suppose it's possible. Do you have an example? Also, what if one of the functions you're not using has a dependency on a library you're not including? – Mark Ransom Mar 16 '11 at 22:39
For example, a Metaware compiler I used said that if one function was required in a library that it would include the entire library. It was based on "translation unit" resolution. If I made a library from a single source file that contained 5 functions and a main program that executed only one of the 5 functions, the entire library would be included. – Thomas Matthews Mar 16 '11 at 22:50
What sould I look for in the documentation if I want to find out how my linker behaves? – George Mar 16 '11 at 22:51
@Thomas, that's actually the common scenario. When I said "module" what I should have said was "translation unit". It was also quite common back in the day for every function to be in its own translation unit. – Mark Ransom Mar 16 '11 at 23:29
@noname: read about how the linker resolves symbols. If not, try my example posted elsewhere in this thread. – Thomas Matthews Mar 16 '11 at 23:32

score 0 · Answer 4 · answered Mar 16 '11 at 22:30

0

Sort of. It will however also need to fix up all the function call pointers. Especially if those function calls exist outside of the static library (ie in another static library or executable).

answered Mar 16 '11 at 22:30

Goz

61,365
24
124
204

5

Do you want apple pie or strawberry cake? Sort of. – orlp Mar 16 '11 at 22:31
@nightcracker: I say "sort of" because a major part of the link process is pretty much copying the stuff from a static lib into the final executable. Function pointer fixups also occur as can stripping. – Goz Mar 16 '11 at 22:34
What's wrong with saying "A major part of the link process is pretty much copying the stuff from a static lib into the final executable. It will however ..."? – orlp Mar 16 '11 at 22:36
The basic definition of a linker is to resolve function call pointers. Unresolved symbols are those symbols that have not been resolved yet. Doesn't answer the OP's question. – Thomas Matthews Mar 16 '11 at 22:38
@Thomas: So surely it doesn't "just" copy and hence I am answering the OP's question ... or am i missing something? – Goz Mar 16 '11 at 22:42
@Goz True, but he's asking about the behaviour of that copy I think - as in, if I have two functions `a()` and `b()` in different object files in the library and I call `a()` but `b()` is never called by the program or library in my use of it (but is in the same library as `a()`) does the linker include the objects containing `b()` anyway? – Mar 16 '11 at 22:59
@Ninefingers: If thats true then you can read between the lines much much better than i can ;) – Goz Mar 16 '11 at 23:01
1

@Goz the bit he's missing (DigitalRoss has got there before me) is an understanding of objects inside libraries. You get the whole object because whilst some symbols are exported, you can't just copy out that function since it might well use other functions inside that object's code. So you have to take the whole object if you have a single required symbol for that object. Remember in assembly unless you prefix something `.globl` or whatever it isn't an exported symbol, but the label can most definitely be called, jumped to or whatever. – Mar 16 '11 at 23:08
@Goz: there are linkers that copy a library or object module, then before resolving function addresses and pointers. By the time of assigning addresses to the functions, removing unused code may be too difficult; so there remains unexecuted functions in an executable. So if `a()` and `b()` are in an object module / library and only `b()` is used, `a()` will still be in the executable. Memory is cheap, development and build time is more expensive. – Thomas Matthews Mar 16 '11 at 23:08

Macke · Answer 5 · 2011-03-17T09:23:04.000

0

It will use only the used functions & symbols (unless told otherwise, but that can be tricky).

Side issue:

This can actually be a problem if you f.ex. have some classes that just register themselves to a factory. No-one calls these classes directly, so they won't be included and thus not registered in the factory. There are ways around this (usually by declaring some anonymous variable in the header file that references the source file).

edited Mar 17 '11 at 09:23

answered Mar 16 '11 at 22:33

Macke

24,812
7
82
118

A better solution would be not to use global variables. – R.. GitHub STOP HELPING ICE Mar 16 '11 at 23:02
@R: This has nothing to do with global variables. It can all be done using functions & locally scoped static variables, more or less. How would your solution to decoupling factory & products look, without having a single registerAllProducts() function? – Macke Mar 17 '11 at 09:26

Thomas Matthews · Answer 6 · 2011-03-16T23:52:31.000

0

Depends on the linker. Some linkers are lazy and just throw the whole library in. The other extreme is linkers that throw in only the necessary code into an executable.

A sample test is to write a program that uses puts and compare with a program that uses printf. If the executables are the same size, you have more of a lazy linker.

Example:

puts_test.cpp

#include <cstdio>
using namespace std;

int main(void)
{
  puts("Hello World\n");
  return 0;
}

printf_test.cpp

#include <cstdio>
using namespace std;

int main(void)
{
  printf("%s\n", "Hello World");
  return 0;
}

With the above example, the puts function does not require extra code for parsing format strings or converting numerics into text. This is the baseline because it requires a minimal library function.

The example using printf requires more functionality. The printf function requires parsing the format string and outputting text.

The expected result is that the printf executable should be larger than the puts executable. Most compilers will haul in all the code for the printf function to resolve symbols (such as for displaying floats) even though that portion of the code is not used. More intelligent (and costly) compilers will break up the printf function and only include the parts that are used or required. In the example above, the compiler should only include the parts for processing text and not include code to format integers and floating point values.

A lazy compiler, or in debug mode, will copy the entire library for the puts example, thus making the executables the same size.

Symbol comparison

The *nix platforms and Cygwin provide tools to obtaining the symbols from executables. One such utility is nm. Run nm on each executable, directing output to a text file. Compare the two text files. Lazy compilers should have the same symbols; except their locations may differ (which is not important to the issue).

edited Mar 16 '11 at 23:52

answered Mar 16 '11 at 22:34

Thomas Matthews

56,849
17
98
154

Ok, I'm ... suspicious. Can you name a single linker that will throw the whole lib in? And is it one that noname is likely to encounter? – DigitalRoss Mar 16 '11 at 22:51
I might ask, in this case, in what sense is this entirely-included module a *library* at all? Wouldn't it be acting just like an object module? (I suppose the entire thing could be skipped if it satisfies no undefined references...) – DigitalRoss Mar 16 '11 at 23:01
1

This answer's advice is wrong. The fact that a test program using `puts` and `printf` will likely be the same size has nothing to do with the linker and everything to do with the (very bad) implementation of the standard library (cough glibc) where nearly every function depends on nearly every other function, especially in stdio. – R.. GitHub STOP HELPING ICE Mar 16 '11 at 23:01
2

@DigitalRoss , I seem to remember that some linkers (20 years ago, so I don't remember which (maybe Microsoft LINK or Pharlap's LinkLOC) would bring in the entire library if you asked them to. So `link foo.obj bar.lib` would bring in the entire library, but `link foo.obj -lib bar.lib` would operate rationally. – Robᵩ Mar 16 '11 at 23:06
@R.: With modern computers, memory is cheap, disk space is huge and processing is very fast. Development time is where the cost is. A fast build time saves money. A fast linker speeds up a build. A fast method to link is to include all the symbols in an object file / library; whether used or not. Executable sizes are no longer as much of a concern as development time (See the C# and Java languages). Try my example with different compilers and optimization settings. – Thomas Matthews Mar 16 '11 at 23:40
@R.: Try another experiment: Create a translation unit containing two functions that print different text. Compile as a library. Create a main function that calls only one of the functions. Build. Compare with a main function that executes both functions. – Thomas Matthews Mar 16 '11 at 23:55
1

@Thomas: translation units and libraries are completely different things. Stop confusing the issue. A good library could be hundreds or thousands of translation units. – R.. GitHub STOP HELPING ICE Mar 17 '11 at 00:12
@Thomas, I have written two families of linkers, one for National Semiconductor, and one for Green Hills Software. It's on my public profile in various places. I kinda think I don't need to run experiments to figure out how they work. I'm sure you have a lot to contribute and you must have seen something that sent you off on this odd direction. But in the end, you seem to be defending a bad answer. Plus, I have no idea what you mean by *"Compile as a library",* especially when the topic is static linking, so I can't do your experiment. – DigitalRoss Mar 17 '11 at 04:19

C and C++ static linking: just a copy?

6 Answers6

Example:

Symbol comparison