14

I have a huge problem. I have a common library, that is used all across my project. This library intensively uses boost.spirit and boost.fusion. Unfortunately, the library is approx. 700Mb in size. All the boost.spirit-heavy code is used and it works well. What steps can be done to reduce its output size? Is there is a tool that can help to determine what template instantiations waste most of the space?

At first, I decided to move all spirit-aware code to cpp files. Second, I will try different compiler flags to optimize for size. I don't know what else to do.

Update(details)

I'm using GNU toolchain. Huge library is actually a static library. Executable, that uses this 700Mb library is 200Mb in size. At least half of the code is in *.h files. Some boost.spirit grammars (very template heavy thing) is also located in *.h files.

Cheers!

Evgeny Lazin
  • 9,193
  • 6
  • 47
  • 83

4 Answers4

5

Moving the spirit aware code to .cpp files is a good first step, it might be incomplete though as you mention having spirit grammar in header files.

  1. Make sure than none of the grammar / rules are ever exported outside the library. If you have the typical include/src directories, then move those files (even if headers) within the src directory.

  2. Mark all those symbols as internal to the library. They should not be accessible from outside the library at all. There are specific pragmas/attributes depending on your compiler, on gcc lookup the visibility attribute: __attribute__ ((visibility ("internal"))). This helps the compiler optimizing them accordingly, notably a compiler may emit the code of a function even if it inlines it at a given call site, just in case this function address is taken. With internal visibility however, since it knows the code will not leave the object, it may elide the function.

  3. I seem to remember a flag to fuse identical function bodies but cannot seem to find again...

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
4

--ffunction-sections will put each function in its own segment. Not useful in its own right, but the linker can remove unused sections with --gc-sections. Now without --ffunction-sections this would only work if an entire source file was unused, i.e. with an insane granularity.

Obviously you need the visibility attribute mentioned by Matthieu, else all functions in the library are "used" by virtue of being visible.

Anne van Rossum
  • 3,091
  • 1
  • 35
  • 39
MSalters
  • 173,980
  • 10
  • 155
  • 350
2

A few suggestions:

  • where possible, try to reuse the same template instantiations (as a simple, and contrived, example, a std::vector<int> and std::vector<float> would have the same internal structure and can both just treat their element data as opaque 4-byte blobs, so one could delegate to the other, and just act as a thing wrapper which just casts back to the correct type, so that the internals of the vector only has to be instantiated for one type, rather than two.

  • try a different compiler. Some compilers reuse identical template instantiations where it isn't going to affect program semantics, while others are more conservative.

  • keep a close eye on what is exported from the library. Symbols which aren't exported, and aren't referenced internally, can be removed by the linker. (Of course, if you're building a static library, this won't kick in until it is linked into an executable. To reduce the size of the library itself, you could try making it a dynamic library instead)

But ultimately, it sounds like you may just have to use a less template-heavy library. (or write a simpler parser than you currently have)

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 1
    One of the problem with reusing template instantiations is that normally each function should have a different address, and that prevents properly fusing different functions together even if their executable code is different. I don't think that VC++ cares much for that, however stricto sensu this optimization is only available if the compiler can prove that the addresses are not used within the program... – Matthieu M. Oct 14 '12 at 15:09
  • @MatthieuM. true, that's why not all compilers do it. It is generally a pretty safe optimization (you don't often compare function addresses, and when you do, and I suspect in most cases where it is invalid, it's fairly straightforward for the compiler to detect it) -- however, it does do wonders for executable size of template-heavy code – jalf Oct 14 '12 at 15:37
2

It has been discussed here : why my C++ output executable is so big?

Basically, look for debugging symbols, order for link dependencies, optimizations and so on ...

Community
  • 1
  • 1
lucasg
  • 10,734
  • 4
  • 35
  • 57