2

I made some changes to a C codebase so that it could compile under G++. Seems to be working, with some annoyances and the hack of -fpermissive -fshort-wchar.

Out of curiosity I compared the stripped -O2 size of the GCC-built executable before my changes, and the G++-built executable after my changes. The "after" was 32 bytes larger (on a 500K-ish binary). I was pleasantly surprised it was so close, but idly wondered why if the optimizer is that consistent it wouldn't be 100% consistent? But maybe something about adding that overload for strchr caused it.

Not important enough to me to worry about. But then I decided to a C build with GCC taking my C++ compatibility changes into account. That stripped -O2 executable was 4096 bytes bigger than the C build prior to my changes.

Does anyone have intuition on why these three sizes would happen this way, and why it would be such a "round" number? The C++ changes were basically all things that should be optimized out, whether in C or C++. Basically:

  • introduction of opaque typing so that functions previously defined in the interface as taking void* would name-mangle consistently. introduced some cast assignments to locals via macro of the opaque type to a local of the proper internal type

  • elimination of a few instances of old-style C function header definition

  • modification of linkage to "extern" for some global constants that hadn't specified linkage before, temporarily tolerating keeping the assignments in the headers but hoping to argue against that

  • changing some signed chars to unsigned chars, and some unsigned longs to unsigned int (but never vice versa)

If anyone has a good intuition for this optimization case, then it would save me the time of backing out each set of related changes individually to see how they affect the size...!

Community
  • 1
  • 1
  • All these size differences are so small, I would just totally ignore them. It's extremely unlikely that they suggest anything interesting, broken, or relevant. It could even just be linker order causing different amounts of padding. – David Schwartz Dec 17 '12 at 17:46
  • 1
    An ELF executable has a lot of things inside, some of them are rounded up to a size multiple of a given power of 2. Probably that's why the 4096. You should check the output of `objdump -x ` on both programs and compare bit by bit. – rodrigo Dec 17 '12 at 17:48
  • @DavidSchwartz I'd agree under most circumstances, and would like to ignore them, but I'm trying to slipstream some changes into a C project that is very size-conscious and not necessarily friendly to bowing to modifications that serve those interested in building with a C++ compiler. So it's good to at least have answers. If it's an ELF chunk-size and could be explained in the noise, then being able to show that will be handy... – HostileFork says dont trust SE Dec 17 '12 at 18:00
  • Slight correction to rodrigo's comment. The ELF can be organized into blocks to improve swapping of code pages. 4096 is a common block size. It doesn't require a power of two for the entire executable. Whether this tactic is used is entirely compiler dependent. It can mean that if a 32-byte size addition takes you over a block boundary, you pay for the whole block. – Mel Nicholson Dec 17 '12 at 18:19
  • Another possibility is that there are certain constructs that both a C compiler and a C++ compiler will accept, but have (sometimes subtly) different semantics, which will lead the compiler to either generate slightly different code and/or the optimizer to make some different decisions about rearranging stuff. It might be useful to compare generated assembly output to see what differences there are... – twalberg Dec 17 '12 at 19:12
  • @twalberg: These differences between C and C++ are few, subtle and generally quite artificial. I'd bet more on something like exception stack frames, RTTI, or struct constructors/destructors/copy-operators. Or maybe simply different optimization choices due to different expected usage. – rodrigo Dec 17 '12 at 20:44

1 Answers1

1

Remember: a C++ program is not intrinsically larger than a C program. The compilation may take more time, but there's nothing that inherently makes a C++ program larger.

And actually...due to the stricter formalisms and wiggle room left for undefined behavior, a C++ compiler may be able to make more optimizations on a C codebase than a C compiler could.

The differences you (a.k.a. me) are describing are small. And as @rodrigo and @MelNicholson point out, even a tiny difference may be exaggerated by block size rounding. A single byte of actual difference could thus lead to 4096 bytes of file size difference.

It might be interesting to grab a bunch of regression tests for a C compiler, build it as C vs. C++ and see if there are any notable differences in executable size, then look for patterns in what causes those differences. If you (a.k.a. me) have the time. It might provide data to feed back into places the compilers could be improved. But there's basically nothing at all to be learned from a single-instance difference of this size on a 500K executable.