Modifying gcc compilation to improve embedded Flash size

Question

Introduction

I am running out of Flash on my Cortex-M4 device. I analysed the code, and the biggest opportunity to reduce code size is simply in predefined constants.

- Example

const Struct option364[] = {
   { "String1",  0x4523, "String2" },
   { "Str3",     0x1123, "S4" },
   { "String 5", 0xAAFC, "S6" }
};

Problem

The problem is that I have a (large) number of (short) strings to store, but most of them are used in tables - arrays of const structs that have pointers to the const strings mixed with the numerical data. Each string is variable in size, however I still looked at changing the struct pointer to hold a simple (max) char array instead of a pointer - and there wasn't much difference. It didn't help that the compiler wanted to start each new string on a 4-byte boundary; which got me thinking...

Idea

If I could replace the 4-byte char pointer with a 2-byte index into a string table - a predefined linker section to which index was an offset - I would save 2 bytes per record right there, at the expense of a minor code bump. I'd also avoid the interior padding, since each string could start immediately after the previous string's NUL byte. And if I could be clever, I could re-use strings - or even part-strings - for the indexes.

But moreover, I'd change the 4 + 2 + 4 (+ 2) alignment to 2 + 2 + 2 - saving even more space!

- Consideration

Of course, inside the source code the housekeeping on all those strings, and the string table itself, would be a nightmare... unless I could get the compiler to help? I thought of changing the syntax of the actual source code: if I wanted a string to be in the string table, I would write it as #"String", where the # prefix would flag it as a string table candidate. A normal string wouldn't have that prefix, and the compiler would treat it as normal.

Implementation

So to implement this I'd have to write a pre- pre-compiler. Something that would process just the #"" strings, replacing them with "magic" 16-bit offsets, and then output everything else to the real (pre)compiler to do the actual compilation. The pre-pre-compiler would also have to write a new C file with the complete string table inside (although with a trick - see below), for the compiler to parse and provide to the linker for its dedicated section. Invoking this would be easy with the -no-integrated-cpp switch, to invoke my own pre-pre-processor that would in turn invoke the real one.

- Issues

Don't get me wrong; I know there are issues. For example, it would have to be able to handle partial builds. My solution there is that for every modified C file, it would write (if necessary) a parallel string table file. The "master" C string table file would be nothing more than a series of #includes, that the build would realise needed recompiling if one of its #includes had changed - or indeed, if a new #include was added.

Result

The upshot would be an executable that would have all the (constant) strings packed into a memory blob of no larger than 64K (not a problem!). The code would know that index would be an offset into that blob, so would add the index to the start of the string table pointer before using it as normal.

Question

My question is: is it worth it?

- Pros:

It would save a tonne of space. I didn't quantify it above, but assume a saving of 5%(!) of total Flash.

- Cons:

It would require the build process to be modified to include a bespoke preprocessor;
That preprocessor would have to be built as part of the toolchain rather than the project;
The preprocessor could have bugs or limitations;
The real source code wouldn't compile "out of the box".

Now...

I have donned my asbestos suit, so... GO!

You could always just write in assembler :-) String alignment on a 4-byte boundary doesn't sound right; what compiler options are you using? — Steve Friedl, Jul 27 '20 at 13:30
@SteveFriedl You've obviously seen my sig-block! The compiler puts all string constants into `.rodata`, which has a 4-byte alignment (as it should). I simply _cannot_ find a way to override where it puts it. — John Burger, Jul 27 '20 at 13:31
Your idea seems very complicate. Lets look the problem from device side. The 4 bytes alignment can be hacked in the linker descriptor (ld file), but could cause other problems. The alignment consumes space when you define multiple strings, and is effective also for assembler. If you don't have big problems on string retrieve timing, an option can be a single big string, that can be accessed sequentially by counting the null string terminators. Very slow for the very last strings, but eventually fittable... — Frankie_C, Jul 27 '20 at 13:57
All the strings combined would still fit into 64K? (such that every string occurance could be replaced by big_blob_buffer+offset ) — wildplasser, Jul 27 '20 at 14:00
@Frankie_C Nice idea - but I'm looking at how to do it across a LARGE number of C files. Whether the `index` field is "offset into string table" versus "which string to count" doesn't matter - I think you'd agree my idea is better for that. My question is how to make the _source code_ easier to manage. The access is merely details... — John Burger, Jul 27 '20 at 14:00
@wildplasser Indeed. I have less than 64K worth of strings, which makes sense... I have only 128K of Flash! — John Burger, Jul 27 '20 at 14:01
In that case: concatenate them (including nulls). A pre-preprocessor picks them up, and puts them in the big_buffer (+temp hash table) The original strings are replaced by a (static) function call or macro, using the found index as its argument. — wildplasser, Jul 27 '20 at 14:05
@wildplasser I'd love to: but what would the source code look like? Remember, I'm defining about 1,000 array element entries with what amounts to ` { "String1", 0x4523, "String2" }` (or their equivalent). I'm trying to streamline that with ` { #"String1", 0x4523, #"String2" }` — John Burger, Jul 27 '20 at 14:08
OK. Hack the ld file and create a new section with alignment==1. Define your strings there using gcc `__attribute__ ((section ("my_sec")))`. — Frankie_C, Jul 27 '20 at 14:09
@JohnBurger LD file: linker descriptor file. The file that describe the memory/peripheral layout of your chip. The one holding: ` /* Constant data goes into FLASH */ .rodata : { . = ALIGN(4); *(.rodata) /* .rodata sections (constants, strings, etc.) */ *(.rodata*) /* .rodata* sections (constants, strings, etc.) */ . = ALIGN(4); } >FLASH ` or the like — Frankie_C, Jul 27 '20 at 14:11
No, in your unprocessed source, the strings are still all intact (possibly wrapped in a `EXPORT('OMG this is a string")` wrapper. After prepreprocessing they will have been replaced by big_buffer+index (possibly as another function or macro) — wildplasser, Jul 27 '20 at 14:13
@Frankie_C As far as I know, `__attribute__((section(".my_sec")))` (note the dot) is purely `C` syntax. I hack `.ld` files for a living. and unless the `C` file references it, you're out of luck. I'm trying to make the `C` syntax as "clean" as possible, and no linker file solution is going to help this. — John Burger, Jul 27 '20 at 14:14
@wildplasser Intriguing - but that would still replace every 4-byte pointer with another 4-byte pointer. I'm asking whether it's possible to replace it with a _2_ -byte pointer, which your macro fix wouldn't do. — John Burger, Jul 27 '20 at 14:16
No, the index can be a 16 bit token number (in tables), or a 32 bit numeric literal (in constants) (the latter indeed makes little sense) — wildplasser, Jul 27 '20 at 14:22
@Frankie_C I just saw your edit. The only way that I have found to change where the compiler puts constant strings is to define them as completely separate identifiers, and reference them where necessary. Given that I have about 2,000 strings hard-coded inside `struct`s, re-defining each of them as their own string just to put them into their own section? Sorry, no. The idea is to keep the syntax where they are, and merely modify how they are processed. — John Burger, Jul 27 '20 at 14:23
@wildplasser But how do I define, create and reference the 16-bit indices? The compiler won't. The linker won't. I have to do them myself - which implies a pre-pre-processor — John Burger, Jul 27 '20 at 14:24
The prepre-processor gathers them, rewriting the source (including data types!) I agree: it takes a lot of instrumentation for maybe a small gain. — wildplasser, Jul 27 '20 at 14:25
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218690/discussion-between-john-burger-and-wildplasser). — John Burger, Jul 27 '20 at 14:26
I remember reading about a plugin to gcc or something like that that would go one step further - it would _compress_ all string literals and decompress them when needed. — KamilCuk, Jul 27 '20 at 16:33

score 0 · Answer 1 · answered Jul 27 '20 at 19:55

This kind of "project custom preprocessor" used to be faily common back in the days when memory was pretty constrained. It's pretty easy to do if you use make as your build system -- just a custom pattern or suffix rule to run your preprocessor.

The main question is if you want to run it on all source files or just some. If only a couple need it, you define a new file extension for source files that need preprocssing (eg, .cx and a .cx.c: rule to run the preprocessor). If all need it, you redefine the implicit .c.o: rule.

The main drawback, as you noted, is that if there's any sort of global coordination (such as pooling all the strings like you are trying to do), changing any source file needing the preprocessor will likely require rebuilding all of them, which is potentially quite slow.