Why is C++ initial allocation so much larger than C's?

Question

When using the same code, simply changing the compiler (from a C compiler to a C++ compiler) will change how much memory is allocated. I'm not quite sure why this is and would like to understand it more. So far the best response I've gotten is "probably the I/O streams", which isn't very descriptive and makes me wonder about the "you don't pay for what you don't use" aspect of C++.

I'm using the Clang and GCC compilers, versions 7.0.1-8 and 8.3.0-6 respectively. My system is running on Debian 10 (Buster), latest. The benchmarks are done via Valgrind Massif.

#include <stdio.h>

int main() {
    printf("Hello, world!\n");
    return 0;
}

The code used does not change, but whether I compile as C or as C++, it changes the results of the Valgrind benchmark. The values remain consistent across compilers, however. The runtime allocations (peak) for the program go as follows:

GCC (C): 1,032 bytes (1 KB)
G++ (C++): 73,744 bytes, (~74 KB)
Clang (C): 1,032 bytes (1 KB)
Clang++ (C++): 73,744 bytes (~74 KB)

For compiling, I use the following commands:

clang -O3 -o c-clang ./main.c
gcc -O3 -o c-gcc ./main.c

clang++ -O3 -o cpp-clang ./main.cpp
g++ -O3 -o cpp-gcc ./main.cpp

For Valgrind, I run valgrind --tool=massif --massif-out-file=m_compiler_lang ./compiler-lang on each compiler and language, then ms_print for displaying the peaks.

Am I doing something wrong here?

To begin with, *how* are you building? What options do you use? And how do you measure? How do you run Valgrind? — Some programmer dude, Jun 20 '19 at 18:45
This is C, not C++. C that compiles with a C++ compiler is just malformed C++. Regardless, C++ has more memory overhead than C as you can already tell. — bigwillydos, Jun 20 '19 at 18:45
Hypothesis: this allocation comes from the standard library. There's more in the C++ standard library. I don't know where 73kb extra comes from, but maybe loading a shared library counts as an allocation. — Justin, Jun 20 '19 at 18:47
If I remember correctly, modern C++ compilers have to an exception model where there is no performance hit to entering a `try` block at the expense of a larger memory footprint, maybe with a jump table or something. Maybe try compiling without exceptions and see what impact that has. Edit : In fact, iteratively try disabling various c++ features to see what impact that has on the memory footprint. — François Andrieux, Jun 20 '19 at 18:48
When compiling with `clang++ -xc` instead of `clang`, the same allocation was there, which strongly suggests its due to linked libraries — Justin, Jun 20 '19 at 18:51
@Justin: The C++ standard library shouldn't actually increase the resident/committed memory by much if no functions in it are called (the shared library would be mapped, but the map would never be populated). But yeah, it would increase the virtual memory allocation. That usually not an issue though; if you're in a position where losing ~73 KB of your address space (not actual RAM) breaks your program, you likely have other problems. That said, the C standard lib is a lot larger than 1 KB, and if that were the issue, you should see it using a lot more memory too. — ShadowRanger, Jun 20 '19 at 19:00
@ShadowRanger Wow, massif docs don't mention what kind of memory it measures! I cannot find a mention of "mapped" or "committed" anywhere. — Yakk - Adam Nevraumont, Jun 20 '19 at 19:08
@FrançoisAndrieux: I think it's less about modern compilers, and more about modern OS/architecture ABIs. The older x86 ABIs (at least for Windows) [used frame based exception handling](https://www.osronline.com/article.cfm%5earticle=469.htm) which had to insert prologs before every try block. Modern ABIs almost always use a table based approach that often increases the cost when an exception is thrown, in exchange for being zero overhead for `try` blocks that don't end up throwing an exception. The compiler often doesn't have a choice, it has to follow the ABI conventions. — ShadowRanger, Jun 20 '19 at 19:45
@bigwillydos This is indeed C++, I do not see any part of the C++ specifications it breaks... Other than potentially including stdio.h rather than cstdio but this is allowed at least in older C++ version. What do you think is "malformed" in this program? — Vality, Jun 21 '19 at 03:34
I find it suspicious that those gcc and clang compilers generate the exact same number of bytes in `C` mode and the exact same number of bytes `C++` mode. Did you make a transcription error? — RonJohn, Jun 22 '19 at 00:13
I thought this was weird too, but no, there seems to be the same amount of allocations done. The answer provided explains it's from linking the standard library which I assume would remain the same between compilers on the same system, which might explain it. — , Jun 22 '19 at 03:25

Nikos C. · Accepted Answer · 2019-06-25T16:54:52.023

150

The heap usage comes from the C++ standard library. It allocates memory for internal library use on startup. If you don't link against it, there should be zero difference between the C and C++ version. With GCC and Clang, you can compile the file with:

g++ -Wl,--as-needed main.cpp

This will instruct the linker to not link against unused libraries. In your example code, the C++ library is not used, so it should not link against the C++ standard library.

You can also test this with the C file. If you compile with:

gcc main.c -lstdc++

The heap usage will reappear, even though you've built a C program.

The heap use is obviously dependant to the specific C++ library implementation you're using. In your case, that's the GNU C++ library, libstdc++. Other implementations might not allocate the same amount of memory, or they might not allocate any memory at all (at least not on startup.) The LLVM C++ library (libc++) for example does not do heap allocation on startup, at least on my Linux machine:

clang++ -stdlib=libc++ main.cpp

The heap use is the same as not linking at all against it.

(If compilation fails, then libc++ is probably not installed. The package name usually contains "libc++" or "libcxx".)

edited Jun 25 '19 at 16:54

answered Jun 20 '19 at 19:08

Nikos C.

50,738
9
71
96

50

On seeing this answer, my first thought is, "_If this flag helps reduce unneeded overhead, why isn't it on by default?_". Is there a good answer to that? – Nat Jun 21 '19 at 03:48
4

@Nat My guess is at dev time it is slower to compile. When you are ready to create a release build you would turn it on then. Also in a normal/large codebase the difference may be minimal (if you are using lots of the STD library etc.) – DarcyThomas Jun 21 '19 at 05:04
24

@Nat The `-Wl,--as-needed` flag removes libraries that you specify in your `-l` flags but you're not actually using. So if you don't use a library, then just don't link against it. You don't need this flag for this. However, if your build system adds too many libraries and it would be a lot of work to clean them all up and only link those needed, then you can use this flag instead. The standard library is an exception though, since it's automatically linked against. So this is a corner case. – Nikos C. Jun 21 '19 at 07:03
36

@Nat --as-needed can have unwanted sideeffects, it works by checking whether you use any symbol of a library and kicks those out that fail the test. BUT: a library could also do various things implicitly, for example, if you have a static C++ instance in the library then its constructor will be automatically called. There are rare cases where a library you dont explicitly call into is necessary, but they exist. – Norbert Lange Jun 21 '19 at 08:16
2

@NorbertLange Yes, there are some corner cases where `--as-needed` can go wrong. It seems to be very rare though. AFAICT, most Linux distributions build their packages with this flag enabled to deal with packages that just link against everything willy-nilly and so end up bloating the package dependencies. Still, the best solution of course is to keep your build system in good shape so that you always know what libs are actually needed. – Nikos C. Jun 21 '19 at 08:40
3

@NikosC. Buildsystems do not know automatically which symbols your application uses, and which libraries implement them (varies between compilers, archs, distros and c/c++ libraries). Getting that right is rather troublesome, atleast for the base runtime libraries. But for the rare cases you need a library, you should simply use --no-as-needed for that one, and leave --as-needed everywhere else. A usecase I seen is libraries for tracing/debugging (lttng) and libraries that do something of the sort of authenticating/connecting. – Norbert Lange Jun 21 '19 at 09:04

score 16 · Answer 2 · answered Jun 21 '19 at 13:23

Neither GCC nor Clang are compilers -- they're actually toolchain driver programs. That means they invoke the compiler, the assembler, and the linker.

If you compile your code with a C or a C++ compiler you will get the same assembly produced. The Assembler will produce the same objects. The difference is that the toolchain driver will provide different input to the linker for the two different languages: different startups (C++ requires code for executing constructors and destructors for objects with static or thread-local storage duration at namespace level, and requires infrastructure for stack frames to support unwinding during exception processing, for example), the C++ standard library (which also has objects of static storage duration at namespace level), and probably additional runtime libraries (for example, libgcc with its stack-unwinding infrastructure).

In short, it's not the compiler causing the increase in footprint, it's the linking in of stuff you've chose to use by choosing the C++ language.

It's true that C++ has the "pay only for what you use" philosophy, but by using the language, you pay for it. You can disable parts of the language (RTTI, exception handling) but then you're not using C++ any more. As mentioned in another answer, if you don't use the standard library at all you can instruct the driver to leave that out (--Wl,--as-needed) but if you're not going to use any of the features of C++ or its library, why are you even choosing C++ as a programming language?

The fact that enabling exception handling has a cost even if you don't actually use it is a problem. That's not normal for C++ features in general, and it's something that the C++ standards working groups are trying to think of ways to fix. See Herb Sutter's keynote talk at ACCU 2019 [De-fragmenting C++: Making exceptions more affordable and usable](https://www.youtube.com/watch?v=os7cqJ5qlzo). It is an unfortunate fact, though, in current C++. And traditional C++ exceptions will probably always have that cost, even if/when new mechanisms for new exceptions are added with a keyword. — Peter Cordes, Jun 26 '19 at 03:52

Why is C++ initial allocation so much larger than C's?

2 Answers2

Linked