Executable with many sources in subdirectories - should they be collected into libraries?

Question

Suppose I have a project myproj with a single executable myproj_exec that depends on code in hundreds of files. Let's choose C++ as a language (although it could be C and perhaps one of any number of compiled languages). Some of the files are more closely related, but they are not used together cohesively enough to be spun off into separate libraries. Suppose also that these files are scattered in a tree of subdirectories.

Now, when building my executable - or perhaps when authoring CMakeLists.txt files for my project - I have (at least) two options for how to define the relations between these source files and the myproj executable:

Adding each of the files directly as dependencies for the executable target, e.g. in each subdir have
```
# CMakeLists for subdir
target_sources(myproj_exec subdir_file_1.cpp subdir_file_2.cpp)
```

Defining intermediate, per-subdirectory library targets, and having the executable depend on these intermediates, e.g.

# CMakeLists for subdir
add_library(some_sub_dir_artificial_tgt)
target_sources(some_sub_dir_artificial_tgt subdir_file_1.cpp subdir_file_2.cpp)
target_link_libraries(myproj_exec some_sub_dir_artificial_tgt)

or perhaps the last line isn't included, and in the top-level CMakeLists.txt we would have something like:

target_link_libraries(myproj_exec dir_lib_1 dir_lib_2 dir_lib_3)

What considerations would you make in choosing between these two approaches?

Notes:

Assume that there isn't a significant motivation for defining these libraries otherwise than in the context of building foo.
These artificial target are not intended to be shared libraries, but rather linked statically into the executable.

My project has over 250,000 source files, consisting of 57 million lines of code. I would not make a library for them without having a reason. (They do go into a library, and the reason is that they are used for the product itself, and for the test suite executable.) — Eljay, Jul 11 '23 at 17:05
This is going to divide the crowd. Voting to close as opinion based. — Paul Sanders, Jul 11 '23 at 19:37
@PaulSanders: If I rephrased as pros-and-cons - would you consider reopening? — einpoklum, Jul 11 '23 at 20:04

ComicSansMS · Answer 1 · 2023-07-11T18:32:48.693

1

When talking about gratuitously introducing static libraries: I can't think of a good reason to do this.

For shared libraries: Each shared library is linked on its own. By clever partitioning of the code base into shared libraries, you may be able to significantly reduce load on the linker, with potential benefits for your build times. This requires deliberate engineering effort though and also has potential downsides (in particular a risk of increased startup times for your application).

edited Jul 11 '23 at 18:32

answered Jul 11 '23 at 17:12

ComicSansMS

51,484
14
155
166

Which of the options is "this"? Introducing libraries, or avoiding libraries? – einpoklum Jul 11 '23 at 18:06

starball · Answer 2 · 2023-07-13T23:16:50.790

I also have a hard time imagining why doing this would be useful.

The combination of the following two statements is confusing to me:

What considerations would you make in choosing between these two approaches?

Assume that there isn't a significant motivation for defining these libraries otherwise than in the context of building foo.

I personally wouldn't even spend the time thinking deeply about it if there wasn't some other good reason to have libraries.

I've understood the reason for creating libraries to be primarily about reusability, which doesn't seem to be a concern here.

In terms of build speed, it would probably depend on the nature of the linker, object format, and binary format. For example, I suppose if you had a linker whose speed was worse than linear with the "size" of its input, this sort of divide-and-conquer could be useful with a buildsystem that can know when a library doesn't need to be rebuilt, and you often don't make changes that touch many of those theoretical libraries. There could be more work for the linker to handle things like function-local static object addresses and inline function addresses. If you use something like Link Time Optimization, that discussion on possible speed gains probably goes out the window.

In terms of disk space usage, you might get some extra space usage with these intermediate libraries from intermediate code duplication for things like functions defined in common headers.

In the end, you just have to measure to see what actual difference you get.

But brings me to how much work it would be to actually make the change. Maybe take some time to consider how much benefit you think you might possibly get from such a refactoring before spending your time doing it.

However, if we cross out the weird constraint in the question post that says

Assume that there isn't a significant motivation for defining these libraries otherwise than in the context of building foo.

, then there is a good reason why someone might want to do that, which I already mentioned: reusability- or rather- defensive / preparatory reusability. There are people of the position that it's good practice to write your executables as somewhat thin wrappers around libraries that hold the business logic-y things (Ex. Herb Sutter if I recall correctly (memory foggy on this)). That way, if you ever have reason to want to reuse some subcomponents, you've already done the work to make that much easier on yourself. For example, if you want to write tests against a library instead of against an executable.

There are likely other uses that I'm not aware of.

TBH - I'm looking at an existing codebase which has gone the route of multiple intermediate libraries and I'm trying to understand whether that's justifiable. — einpoklum, Jul 13 '23 at 10:52
@einsupportsModeratorStrike see edit. and you should mention that in your question post. It's also unclear to me how accurate it is for you to speak on that codebase's behalf in saying that there "_isn't a significant motivation for defining these libraries otherwise than in the context of building foo_" — starball, Jul 13 '23 at 20:08

Executable with many sources in subdirectories - should they be collected into libraries?

2 Answers2