1

I recently updated my Linux laptop from Ubuntu 16.04 to 18.04.

I had a STM32 (Cortex-M4) Makefile based project that compiled correctly with the arm-none-eabi g++ version provided by Ubuntu. The generated file required 47620 bytes in the .text section.

With the Ubuntu upgrade, I have also installed an up-to-date version of gcc (from ARM website). Version is 8.2.1.

When I compile the same project (make clean && make), the generated binary do not fit in flash (97424 bytes required, more than twice!). The project is exactly the same (sources, link script, startup files, Makefile).

The compiler options are: -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -DSTM32F303x8 -DARMCM4 -O0 -g -Wall -fexceptions -Wno-deprecated.

The linker options are -mthumb -mcpu=cortex-m4 -Tstm32f303K8.ld -mfloat-abi=hard -mfpu=fpv4-sp-d16 --specs=nosys.specs -lm -Wl,--start-group -lm -Wl,--end-group -Wl,--gc-sections -Lsys -Xlinker -Map=test.elf.map

When I look at the .Map generated file, all the user functions take approximatively the same size (new version saves 8 bytes!). But after, it includes C++ sepcific parts, and one is more than 26Kb (from map file): .text 0x00000000080079e8 0x683c /usr/local/gcc-arm-none-eabi-8-2018-q4-major/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libstdc++.a(cp-demangle.o) 0x000000000800e13c __cxa_demangle

Note: there is no problem with C only projects, only with C++. The library included are the same (gcc 4.9.3 -> armv7e-m/fpu, and gcc 8.2.1 -> thumb/v7e-m+fp/hard): libm.a libstdc++.a libc.a libnosys.a libgcc.a

Is there a way to get rid of that so that I can compile and flash my (no so old) project?

regards,

Mik
  • 61
  • 5
  • Have you tried stripping the executable? – Matthieu Brucher Jan 04 '19 at 16:23
  • Why are you using `-O0` rather than the obvious `-Os`? – EOF Jan 04 '19 at 16:44
  • I am using no optimisation in a first approach. Then, I turn it on (`-O3 -funroll-loops -fomit-frame-pointer -fno-strict-aliasing -pipe -ffast-math -fexceptions`). But the 2 binaries are compiled using the same flags. (It does not fit in my STM32 with optimisations too… – Mik Jan 04 '19 at 16:53
  • @user3582893 If you want to reply to a specific user, use @user so they are notified. Also, `-O3 -funroll-loops` is likely to significantly *worsen* the code size. I recommend `-Os` instead. – EOF Jan 04 '19 at 16:57
  • Thanks Mathieu for the tip. I just tried with `-Wl,--strip-all`, but it does not fit : ```arm-none-eabi-g++ -o test.elf build/main.o build/mcp23s17.o build/Print.o build/Adafruit_GFX.o build/Adafruit_SPITFT.o build/Adafruit_ST7735.o build/Adafruit_ST77xx.o build/spi.o build/timer.o build/button.o build/adc.o build/codeur.o build/startup_ARMCM4.o build/startup_clock.o -mthumb -mcpu=cortex-m4 -Tstm32f303K8.ld -mfloat-abi=hard -mfpu=fpv4-sp-d16 --specs=nosys.specs -lm -Wl,--start-group -lm -Wl,--end-group -Wl,--gc-sections -Wl,--strip-all -Lsys -Xlinker -Map=test.elf.map ``` – Mik Jan 04 '19 at 16:58
  • @EOF You're right for the -funroll-loop (-Os will be better). But I don't think the problem comes from here, because the project is compiled with the same -O0, and in the second case, the size >2x bigger. – Mik Jan 04 '19 at 17:10
  • gcc's output has not gotten better over time it got better up to around version 4.x.x then it didnt. So bigger binaries are not a surprise. No reason why you cant have several versions of the toolchain on your computer at the same time, whichever one is in the path first wins. Install both in different places and depending on the project choose the one you want. – old_timer Jan 04 '19 at 19:00
  • I have see -O2 produce smaller binaries than -Os, looks like you are using -O3 anyway which is risky compared to -O2, but that should also aim for smaller. See what -Os does though as an experiment. – old_timer Jan 04 '19 at 19:02
  • I would also try armv6-m instead of armv7-m. armv7-m should produce an overall smaller faster binary but it is code dependent you might get lucky on size. I have seen the tools use the larger instruction for no apparent reason, so again you might get lucky. – old_timer Jan 04 '19 at 19:04
  • Compare map files, check whether the code size increase is related to new stuff being added or the size of the old code exploded. If that's the latter, comparing disassembly interleaved with source code might give you some clues. – J_S Jan 04 '19 at 20:25

2 Answers2

3

I found a solution using the libstdc++_nano (instead of implicit libstc++). With that, the code size is reduced from 84kb to 26kb!

LDFLAGS += -lstdc++_nano

It just works. Thanks @Henrik, @Matthieu and @EOF for your support!

Mik
  • 61
  • 5
1

It might be related to exception handling, as std::terminate(), which is used with exceptions, might call the demangling routine. If you don't need exceptions then try disabling them with -fno-exceptions as described here.

Another solution might be to look at the GCC headers:

Demangling routine. ABI-mandated entry point in the C++ runtime library for demangling. [...] returns a pointer to the start of the NUL-terminated demangled name, or NULL if the demangling fails. The caller is responsible for deallocating this memory using free.

The prototype is:

  char*
  __cxa_demangle(const char* __mangled_name, char* __output_buffer,
         size_t* __length, int* __status);
So you could probably just supply your own dummy function returning NULL (Given that all library functions are weak, and can be overridden). I'll advise you to look at the disassembled code first though, and find out how and why it is being called in the first place, since it might change behaviour to just discard functionality).

They also give other advise in This forum post, which might be useful for you as well:

  • Optimize for size with -Os instead of -O0 (possibly add the -Og option instead, if you prefer easily debuggable code, it is often both smaller and faster than -O0).
  • Optimize at link-time with -flto while compiling and linking.
  • Maybe disable RTTI if not used.
  • Thank you Henrik. I tried with -Os, -fno-exceptions, -flto and -fno-rtti. The code is significantly reduced. However, it does not yet fit into my MCU (the reduction is 14kb, but there are still 19kb to remove to fit in flash :/) – Mik Jan 04 '19 at 18:49
  • Okay @user3582893 I guess you should look into the listing again then. Some other library functions might have been added. – Henrik Juul Pedersen Jan 04 '19 at 18:54
  • Try to look for malloc or similar. Could you have gotten a heap somehow? – Henrik Juul Pedersen Jan 04 '19 at 19:06