Why does iostream take so much flash space on an MCU?

Question

I use GCC 5.2.0 to compile code for an EFM32 MCU (based on a Cortex-M core). I notice an awful increase in code size when I want to #include <iostream>.

For example, let's compile the following code for an EFM32WG "Wonder Gecko" chip:

#include "em_device.h"
#include "em_chip.h"
#include <iostream>

int main(void)
{
  CHIP_Init();

  while (1) {
  }
}

This code will result in 172048 bytes of code, whereas without #include <iostream> it is only 1440 bytes.

I usually just use cout for debug output (by implementing the _write function for newlib and routing the output to the SWO pin), but it looks like this approach is very wasteful, considering the MCU only has 256k of flash, and just including this header will make the code use up most of it.

So, my question is: why is including the iostream header make the compiled code take such an insane amount of flash space? And also, is there a way to fix it?

EDIT:

Both the compiler and linker is arm-none-eabi-g++ (version 5.2.0), the C library is the nano C library (I think).

Here are my C++ compiler flags (excluding the include paths):

-g -gdwarf-2 -mcpu=cortex-m4 -mthumb '-DEFM32WG940F256=1' -O0 -Wall -c -fmessage-length=0 -mno-sched-prolog -fno-builtin -ffunction-sections -fdata-sections -mfpu=fpv4-sp-d16 -mfloat-abi=softfp

Here are my linker flags:

-g -gdwarf-2 -mcpu=cortex-m4 -mthumb -T "${BuildArtifactFileBaseName}.ld" --specs=nosys.specs -Xlinker --gc-sections -Xlinker -Map="${BuildArtifactFileBaseName}.map" -mfpu=fpv4-sp-d16 -mfloat-abi=softfp --specs=nano.specs

I tried both with and without optimalizations, but the resulting code size remains about the same (the optimized size is maybe 1k smaller).

EDIT 2

-fno-rtti and -fno-exceptions do not help with the code size either.

At least it's not the 8.6MB of glibc's libstdc++, but any way you look at it the C++ standard library is not a small thing... Anyway, which standard library implementation are you using (newlib?), and with what compiler/linker/optimisation options? — Notlikethat, Aug 13 '16 at 19:23
You should note that just `#include ` instantiates `cout`, `cin` and all that, no matter if you use it or not. — πάντα ῥεῖ, Aug 13 '16 at 19:23
@Notlikethat Glibc doesn't have a libstdc++. The latter comes with GCC. — rubenvb, Aug 13 '16 at 19:25
You should see what using a VLA does to the code size. Jinkies! — user4581301, Aug 13 '16 at 19:28
Why are you using `cout`? It's trivial to write your own `printf` which is usually done on embedded systems. And I highly doubt an entire `iostream` implementation is appropriate for something with only 256k of memory. — uh oh somebody needs a pupper, Aug 13 '16 at 19:33
Try disabling RTTI and exceptions and see if that reduces the bloat with `-fno-rtti` and `-fno-exceptions`. — uh oh somebody needs a pupper, Aug 13 '16 at 19:36
@uhohsomebodyneedsapupper Yes, I'm starting to realize that, but I would also like to understand better what is happening behind the scenes. — Venemo, Aug 13 '16 at 19:36
@uhohsomebodyneedsapupper: Unfortunately, there's no way to write a printf-style routine that uses `float` rather than `double`, since the Standard mandates that variadic functions promote the former to the latter, and also mandates that `double` have a certain minimum precision even though 32 or 48 bits would suffice in many embedded systems and 64 bits would be very expensive costly in terms of both code space and time. — supercat, Aug 13 '16 at 19:42
@rubenvb Ah, I never realised that distinction, thanks for educating me. Of course, if that then means a bare-metal-targeted GCC is still packing the same library implementation as for a full-blown Linux/UNIX target, then that [pretty much sums up the problem in a nutshell](https://gcc.gnu.org/onlinedocs/libstdc++/faq.html#faq.size). — Notlikethat, Aug 13 '16 at 19:44
I suspect that most of the bloat is coming from `-g`. Debugging information adds **a lot** of bytes to the binary. — uh oh somebody needs a pupper, Aug 13 '16 at 19:45
@Notlikethat You have a misunderstanding here. `libstdc++` needs to be ported to the target system, which requires porting a C library (newlib) and writing the glue (libgcc and others). GCC doesn't provide a embedded "libstdc++" as the C++ specification does not either. It's simply a developer preference. — uh oh somebody needs a pupper, Aug 13 '16 at 19:47
if you add any of these standard libraries, printf (not a roll your own) which has massive dependencies, or cout, same answer, etc. If all you are after is debug printing that can be done in a couple dozen lines of code, not as pretty of a format, but there is no reason for fancy formatting for debug output. You have to weight the pros and cons for every standard library you want, even math libraries for basic math functions, it all adds bloat to the binary which is fine for apps on a host, but for bare metal embedded...not so much. — old_timer, Aug 14 '16 at 03:06
As to your specific question, should be simple to use readelf or objdump or various other solutions to determine where the bloat is. THEN ask why if it is not obvious. — old_timer, Aug 14 '16 at 03:11
@uhoh Gotcha - pity a poor kernel hacker, I've not actually linked anything against a standard library in years ;) Taking a closer look at the ARM bare-metal GCC toolchain I have handy, I see it offers not only its own libstdc++.a but also a smaller libstdc++_nano.a, weighing in at a svelte 2.7MB... — Notlikethat, Aug 15 '16 at 22:19
@Venemo and forget about the exceptions as well. enerally speaking programming uCs you should forget printf, iostream and malloc — 0___________, Dec 10 '19 at 15:11

score 3 · Accepted Answer · answered Dec 10 '19 at 14:01

While the compiler does try to eliminate complete includes or parts of them that are not used this sometimes fails. Some headers just by being included cause code to be run - meaning that even if you do not refer to anything included from the header the compiler is not free to remove the code from it.

<iostream> is such an example as it declares some global objects whose constructors are run before main is called. Its inclusion will roughly increase the binary size for an STM32 by 140kB.

You can check this behaviour and reasoning of the gcc developers on github.

The solution is to avoid on microcontrollers and use what C offers for printing such as printf().

Yes, that's exactly what I ended up doing. – Venemo Jan 09 '20 at 15:01 — Venemo, Jan 09 '20 at 15:01

Why does iostream take so much flash space on an MCU?

1 Answers1

Linked