How can I group RAM variables to prevent padding with Gnu compiler toolchain?

Question

We have a substantial embedded project using the ARM Cortex M33 processor, written in about 30,000 lines of 'C'. We use the Gnu compiler/linker toolchain. There is a mix of custom software, some open source libraries (e.g. - FatFS), and vendor-specific libraries (e.g. - Bluetooth stack). We have about 500 named static variables and data structures that occupy RAM. Variables are a range of sizes from one byte up to hundreds of byte for a structure. For the most part, variables are one, two, or four bytes.

The linker is grouping all variables from a single file by size and adds alignment bytes when switching between groups of four byte and two byte variables, or two byte and one byte byte variables. The amount of RAM lost to this alignment has become a problem, as RAM is limited.

I would like to somehow instruct the compiler and linker to put all variables of one byte into a single linker section, two bytes into another linker section, four bytes into a third linker section, and everything else into a fourth linker section. Alignment for each section can then be controlled and the alignment padding should disappear or be minimized. It would also work if I could tell the linker to sort variables by size when assigning memory addresses.

How can I do this, or something like this? The gnu compiler and linker manuals don't seem to indicate how to tell the compiler to do anything like this. The amount of RAM that has disappeared to fill in the cracks is now about 3% of our total available memory and we are running out. I am looking for creative suggestions and would like to hear about any partial solutions you may know of.

The *compiler* usually adds padding to make sure all variables are aligned on their most optimal address. If you rearrange the order so that all one-byte variables (and structure members) are grouped together in the source code, then you will avoid the padding you get by having larger variables in between. Same with all the other types. Same with possible arrays, place them so that there's no padding. You *can* mix different types, as long as you're aware of the alignment requirements of each type. — Some programmer dude, Mar 14 '23 at 20:38
You can also change the *packing* (or alignment requirement) of structures so that the compiler doesn't add padding. But note that unaligned variable access can be slower, or on some CPU's not possible at all. — Some programmer dude, Mar 14 '23 at 20:40
I also think that you have serious project problems if you have so many global variables. — 0___________, Mar 14 '23 at 20:41
If you mean that the linker just throws away the compiler-generated padding and alignment when creating the `.data` and `.bss` segments, then I somehow doubt that. The linker will likely add padding between data from different translation units, but that should be rather minor. If you want to avoid even that, then there are two possible workarounds: Decrease the number of global variables (good idea in general); Or use (possible packed) structures for all global variables. — Some programmer dude, Mar 14 '23 at 20:45
If you are indeed already getting size-based grouping of your variables on a file-by-file basis, as you say, with only 1-, 2-, and 4-byte alignment to consider, then the worst lossage you could be seeing is 3 bytes per file, and the average would be half that. I know you're targetting embedded, but do you really have so many source files that 1.5 bytes per file adds up to 3% of available memory? If so, then how can you have enough space for all the actual variables? Plus a stack? — John Bollinger, Mar 14 '23 at 20:46
How about using `__attribute__ (aligned (X))` with X being 1, 2, 4 byte alignment when each variable is declared? Also, make sure the structures are packed (`__attribute__ (packed)`) to avoid padding. — jvieira88, Mar 14 '23 at 20:55
Does the `--sort-section=alignment` option to `ld` do what you need? If you're using `gcc` to invoke the linker you'll need to do `-Wl,--sort-section=alignment`. — pmacfarlane, Mar 14 '23 at 21:12
As @JohnBollinger commented, I'd read your posted description of how memory layout is being handled as implying that your padding losses from between global variables in insignificant. Perhaps there is some misunderstanding here. OTOH, what I anticipated, but you did not mention, is losses do to padding within a struct layout. This is may be reducible by rearranging the order of struct members as long as it doesn't violate 3rd party library or meta requirements. — Avi Berger, Mar 14 '23 at 21:17
Another thing to consider when arranging your globals is their cache locality. While better packing may help reduce wastage, you need to be careful about where they're located. If you're accessing some of data often, you want group them near each other in memory to increase the chances of cache hits. You certainly don't want to take a bunch of frequently-accessed data and move them further apart. — paddy, Mar 14 '23 at 23:22
I would also add I am not talking about padding within data structures. We pack data structures when appropriate and sort structure members by size to avoid padding within a structure. The extra bytes I am trying to remove are inserted by the linker as it tries to follow alignment rules. Structure padding is visible in the linker map when a named variable can be represented by 9 bytes in theory but appears as 12 bytes in practice when looking at the variable size in the linker map. The RAM I am focusing on here are gaps between named variables, they are not named or accessed by the CPU. — user2246302, Mar 15 '23 at 15:06
Paddy - I agree that cache considerations could be important. This class of processors is not so advanced that it would be a concern in this case. All RAM is internal to the ARM Cortex M33 silicon, no external RAM or Flash is present. All RAM and Flash are accessible in zero wait states at the clock speeds used. There is no cache available. — user2246302, Mar 16 '23 at 00:54
For those that assume all RAM variables must be global variables, this is an incorrect assumption. There are no global variables in the classic sense, all are limited in scope to at least a file level (ie-use the static keyword in the declaration) and even less scope when appropriate. Variables are persistent, not global. I agree there are many variables. Like I said, it's a substantial embedded project. There is one stack used in the traditional way. Total available RAM is 32 Kbytes, total available Flash is 512 Kbytes for code and const data. There is also 8 Mbytes in a SPI Flash chip. — user2246302, Mar 16 '23 at 01:21
@user2246302: The C term for objects that persist through execution of a program is *static*. You are correct, C does not have any global name space (except for keywords). Identifiers can have file scope and external linkage, which are different from global, even when combined. — Eric Postpischil, Mar 16 '23 at 01:25
Eric P: As I am sure you are aware, removing the *static* keyword from variables declared at file scope makes them global (ie- public symbol usable in all files across the project). This was my intended meaning with the phrase, "in the classic sense". There never was the concept of namespaces in the 'C' language, that is something C++ introduced. My apologies for any confusion I may have caused in previous comments. — user2246302, Mar 16 '23 at 01:38
@user2246302: No, it does not make them global. It gives them external linkage, which is different. C does have namespaces; they are discussed in the C standard. They are not as flexible as C++ namespaces. None of them is global. With a global namespace, one declaration makes a name known throughout a program. C does not have that; to be known throughout a program, a name must be declared in each translation unit. — Eric Postpischil, Mar 16 '23 at 02:13

score 3 · Answer 1 · answered Mar 14 '23 at 21:32

3

You can tell the linker to sort by alignment, either globally, or per-section. To do it for all sections, add:

--sort-section=alignment

to the flags that you pass to ld.

If you are using gcc to invoke the linker, you'd need to add this to your CFLAGS:

-Wl,--sort-section=alignment

Alternatively, you can do it on a per-section basis in your linker file, e.g. change:

*(.data*)

to

*(SORT_BY_ALIGNMENT(.data*))

and ditto for the .bss section if desired.

answered Mar 14 '23 at 21:32

pmacfarlane

3,057
1
7
24

An excellent answer to the question posed. Not that I think the OP asked for something that will actually help them much. – John Bollinger Mar 14 '23 at 22:01
The only problem is that arm toolchain does not recognize this option error: error: unrecognized command-line option `'--sort-section=alignment'; did you mean '--limit-function-alignment'?` – 0___________ Mar 14 '23 at 22:12
@JohnBollinger I tried it before but the problem is that arm gcc toolchain does not recognize this option – 0___________ Mar 14 '23 at 22:15
0___ you forgot the `-Wl,` – pmacfarlane Mar 14 '23 at 22:15
@pmacfarlane it is an error message from the linker. It does not include gcc part. I did not for forget – 0___________ Mar 14 '23 at 22:16
Works fine for me in a recent-ish CubeIDE. I did test it before answering. – pmacfarlane Mar 14 '23 at 22:18
`--limit-function-alignment` appears in the manpage for `gcc`, but not for `ld`.. – pmacfarlane Mar 14 '23 at 22:21
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/252521/discussion-between-pmacfarlane-and-0). – pmacfarlane Mar 14 '23 at 22:23
@pmacfarlane show me your full command line options – 0___________ Mar 14 '23 at 22:24
`arm-none-eabi-gcc -o "test-comp.elf" @"objects.list" -mcpu=cortex-m0 -T"/home/philip/projects/test-comp/STM32F030C8TX_FLASH.ld" --specs=nosys.specs -Wl,-Map="test-comp.map" -Wl,--gc-sections -static -Wl,--sort-section=alignment --specs=nano.specs -mfloat-abi=soft -mthumb -Wl,--start-group -lc -lm -Wl,--end-group` – pmacfarlane Mar 14 '23 at 22:31
second option does not work as well (SORT_BY_ALIGNMENT) [2]: https://i.stack.imgur.com/OexpC.png – 0___________ Mar 14 '23 at 22:38
@0___________ I'm sorry you can't make it work, but it works for me. I have tested both of my proposed solutions and they both work fine. STM32CubeIDE 1.7.0 if you want to try it. – pmacfarlane Mar 14 '23 at 22:52

0___________ · Answer 2 · 2023-03-14T21:10:10.647

I would like to somehow instruct the compiler and linker to put all variables of one byte into a single linker section, two bytes into another linker section

In the linker script define sections where you are going to put data having a specific size. This example is only for 1 and two bytes - as they may be problematic.

  _sidata8 = LOADADDR(.data8);
  .data8 : ALIGN(1)
  {
    _sdata8 = .;      
    KEEP(*(.data8))          
    KEEP(*(.data8*))         
    _edata8 = .;      

  } >RAM AT> FLASH
  
  _sidata16 = LOADADDR(.data16);
  .data16 : ALIGN(2)
  {
    _sdata16 = .;      
    KEEP(*(.data16))          
    KEEP(*(.data16*))         
    _edata16 = .;      

  } >RAM AT> FLASH

(KEEP is only because I do not use those variables and compiler is very likely to optimize them out)

when you declare the variables place them in the correct sections.

char   __attribute__((section(".data8"))) a;
short __attribute__((section(".data16"))) d;
char   __attribute__((section(".data8"))) b;
short __attribute__((section(".data16"))) e;
char   __attribute__((section(".data8"))) c;
short __attribute__((section(".data16"))) f;

The result:

Remember that you need add some startup code to initialize or zero your specific sections.

Example

#if defined(__GNUC__)

extern uint8_t _sdata8[];
extern uint8_t _sdata16[];
extern uint8_t _edata8[];
extern uint8_t _edata16[];
extern uint8_t _sidata8[];
extern uint8_t _sidata16[];

static void __attribute__((constructor)) initDatax(void)
{
    memcpy(_sdata8, _sidata8, _edata8 - _sdata8);
    memcpy(_sdata16, _sidata16, _edata16 - _sdata16);
}

#endif

It will require some work (adding attributes - you can shorten it by using macro definitions) but you can organize your data to avoid padding between the variables. It will work projectwide - so any data defined as .data8 will be placed in this section.

How can I group RAM variables to prevent padding with Gnu compiler toolchain?

2 Answers2