4

I am trying to sort out an embedded project where the developers took the option of including all the h and c files into a c file, then they can compile just that one file with the -whole-program option to get good size optimization.

I hate this and am determined to make this into a traditional program just using LTO to achieve the same.

The versions included with the dev kit are; aps-gcc (GCC) 4.7.3 20130524 (Cortus) GNU ld (GNU Binutils) 2.22

With one .o file .text is 0x1c7ac, fractured into 67 .o files .text comes out as 0x2f73c, I added the LTO stuff and reduced it to 0x20a44, good but nowhere near enough.

I have tried --gc-sections and using the linker plugin option but they made no further improvment.

Any suggestions, am I see the right sort of improvement from LTO?

Chris Aaaaa
  • 167
  • 1
  • 8
  • Is bumping the toolchain versions an option at all? – Joe Oct 13 '15 at 15:13
  • 2
    I was going to suggest the same, IIRC later versions of GCC made notable improvements in the LTO area. Also, use profile feedback to further reduce executable size. And compile with `-Os`, of course. – user703016 Oct 13 '15 at 15:28
  • why not make a tool that creates a single big file at compile time , when compiling for final production. THis is what the sqlite team do – pm100 Oct 15 '15 at 23:19
  • Working on getting an upgrade, surprisingly difficult for a free tool, the brochure ware suggests 4.9.1 is around. – Chris Aaaaa Nov 10 '15 at 10:16
  • I have an idea I can run a double build, release compiles the thing that includes everything, debug build the objects independently. So far I am flummoxed by running out of memory on the target as I fracture the code, so I cannot keep testing as I refactor. This is all made difficult by the development of the code continuing as a huge monolithic block, each indivdual C file see the includes of those parsed before, so individual files compile differently or not at all standalone. – Chris Aaaaa Nov 10 '15 at 10:20

2 Answers2

0

To get LTO to work perfectly you need to have the same information and optimisation algorithms available at link stage as you have at compile stage. The GNU tools cannot do this and I believe this was actually one of the motivating factors in the creation of LLVM/Clang.

If you want to inspect the difference in detail I'd suggest you generate a Map file (ld option -Map <filename>) for each option and see if there are functions which haven't been in-lined or functions that are larger. The lack of in-lining you can manually resolve by forcing those functions to inline by moving the definition of the function into a header file and defining it as extern inline which effectively turns it into a macro (this is a GNU extension).

Larger functions are likely not being subject to constant propagation and I don't think there's anything you can do about that. You can make some improvements by carefully declaring the function attributes such as const, leaf, noreturn, pure, and returns_nonnull. These effectively promise that the function will behave in a particular way that the compiler may otherwise detect if using a single compilation unit, and that allow additional optimisations.

In contrast, Clang can compile your object code to a special kind of bytecode (LLVM stands for Low Level Virtual Machine, like JVM is Java Virtual Machine, and runs bytecode) and then optimisation of this bytecode can be performed at link time (or indeed run-time, which is cool). Since this bytecode is what is optimised whether you do LTO or not, and the optimisation algorithms are common between the compiler and the linker, in theory Clang/LLVM should give exactly the same results whether you use LTO or not.

Unfortunately now that the C backend has been removed from LLVM I don't know of any way to use the LLVM LTO capabilities for the custom CPU you're targeting.

Parakleta
  • 1,121
  • 10
  • 19
0

In my opinion, the method chosen by the previous developers is the correct one. It is the method that gives the compiler the most information and thus the most opportunities to perform the optimizations that you want. It is a terrible way to compile (any change will require the whole project to be compiled) so marking this as just an option is a good idea.

Of course, you would have to run all your integration tests against such a build, but that should be trivial to do. What is the downside of the chosen approach except for compilation time (which shouldn't be an issue because you don't need to build in that manner all the time ... just for integration tests).

dave
  • 4,812
  • 4
  • 25
  • 38