How does the arm-none-eabi-as choose section alignment?

Question

I am playing with arm-none-eabi-as trying to understand how it aligns sections. I have the following source:

; source.s
.text
.byte 0xff
.byte 0xff
.byte 0xff

I am inspecting the resulting object file:

$ arm-none-eabi-as -mthumb -o source.o source.s
$ arm-none-eabi-readelf -S source.o
There are 8 section headers, starting at offset 0xec:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000003 00  AX  0   0  1
  [ 2] .data             PROGBITS        00000000 000037 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          00000000 000037 000000 00  WA  0   0  1
  [ 4] .ARM.attributes   ARM_ATTRIBUTES  00000000 000037 000014 00      0   0  1
  [ 5] .symtab           SYMTAB          00000000 00004c 000060 10      6   6  4
  [ 6] .strtab           STRTAB          00000000 0000ac 000004 00      0   0  1
  [ 7] .shstrtab         STRTAB          00000000 0000b0 00003c 00      0   0  1

The .text section is byte-aligned and contains the 3 bytes.

Now, I add an instruction to source.s:

; source.s
.text
.byte 0xff
nop
.byte 0xff
.byte 0xff

Looking into the object file, now all of a sudden the .text section is halfword-aligned:

There are 8 section headers, starting at offset 0x114:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000006 00  AX  0   0  2
  [ 2] .data             PROGBITS        00000000 00003a 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          00000000 00003a 000000 00  WA  0   0  1
  [ 4] .ARM.attributes   ARM_ATTRIBUTES  00000000 00003a 000014 00      0   0  1
  [ 5] .symtab           SYMTAB          00000000 000050 000080 10      6   8  4
  [ 6] .strtab           STRTAB          00000000 0000d0 000007 00      0   0  1
  [ 7] .shstrtab         STRTAB          00000000 0000d7 00003c 00      0   0  1

What is causing the assembler to decide to pad the section in the second case? I am confused because:

if the section is .data then the assembler will not pad it anyway, which makes sense, but
even if the section is .text, the assembler won't pad it unless it sees an instruction (I can have as many data directives, the section won't be padded without having also an instruction), and finally
the nop instruction is definitely not aligned and the assembler has no problem with it, but it still decides to care about section alignment.

How is the assembler deciding here to pad? Can I force the assembler to not pad the .text section even if I had an instruction?

https://sourceware.org/binutils/docs/as/Section.html only mentions an alignment override as part of `.section` for COFF, not ELF targets. IDK if there's anything ARM-specific in the manual (https://sourceware.org/binutils/docs/as/ARM_002dDependent.html) — Peter Cordes, Jun 19 '23 at 05:33

score 1 · Accepted Answer · answered Jun 20 '23 at 16:25

1

Here is what I learned after discussing it with the binutils folks:

The assembler does not care if one encodes misaligned instructions in any particular section. However, it cares about two aspects:

that the overall section alignment of any section is matching to the most aligned element within that section, and
that every section which is marked with the eXecute flag is padded to match its alignment.

The assembler enforces (2) because later those sections might need to be merged and after the merge all the elements within must keep their alignment (thus it pads the sections to ensure this property).

The difference between the two cases shown in the original post is that:

in the first case (without the instruction), the most aligned element was .byte with an alignment of 1, thus giving the entire section an alignment of 1, while
in the second case, the nop thumb instruction becomes the most aligned element and the section is now 2-aligned.

Since the section in the example is a .text section and has the X flag set, the assembler will pad the section in the second case to match its alignment. Nevertheless, the nop instruction is still misaligned inside the section.

answered Jun 20 '23 at 16:25

Alexandru N. Onea

423
2
6
18

This fits with everything I have observed. I think the main point is that the `NOP` has alignment 2. Also, if you used `.arm`, I think you would have alignment 4. So, probably important to note that this is for `.thumb` mode, which you have set with the `-mthumb` command line option. – artless noise Jun 20 '23 at 16:52
it made sense from the beginning that the padding was to make the instruction aligned......but the instruction was not aligned...did you try byte, nop, byte, byte, nop. or byte nop byte nop. if they pad for alignment but then dont actually align how are we supposed to know to shift on load? or was that elsewhere in the readelf output? – old_timer Jun 23 '23 at 19:42
The padding is there for section alignment and not for element alignment within a section. If elements are misaligned within a section that is a user bug, but what would be a toolchain bug is that sections that require elements to be aligned (like an executable section) - and the toolchain assumes we are not monkeys and have aligned the elements within - after they are merged no longer have those elements aligned. That's why the assembler pads sections to match the alignment so when they are merged, whatever was aligned within (correctly or not by the user) keeps that alignment afterwards. – Alexandru N. Onea Jun 25 '23 at 11:21

score 0 · Answer 2 · answered Jun 19 '23 at 16:34

How is the assembler deciding here to pad?

Instruction in thumb(2) mode must start at a 16 bit offset. If you use .arm, you will see padding to 32 bits. It is part of the ISA. Your NOP instruction will not function (if somehow you jump directly to the NOP). Of course, the rest of the stream is garbage. For the normal case that the NOP in a text section is code, you want it to be aligned. The low bit in a pointer is typically used to denote a 'Thumb' or 'ARM' mode routine.

Can I force the assembler to not pad the .text section even if I had an instruction?

It would be an odd case to have a NOP as data; if so, define it yourself if you don't want padding.

 .set UNALIGNED_NOP, 0xbf00  ; double check this constant.

You are using a tool in a completely wrong way. Use the '.rodata' section if you want data. The assemblers job is to convert human mnemonics to binary. For the normal case of NOP in a text section, you want it to be executable. If the tools are not supported your use case, there would be dozens of other cases which would break. For instance,

   ;; basic block
   ;; xxxx
   b  label
   .word xxxx  ; ltorg, etc.
   .word yyyy
   .byte xx
  label:
    add r0, r0, #1 ; crash because it is not aligned.

I answered your question directly, but I really think you have a mis-understanding and wonder what it is you really want to do? — artless noise, Jun 19 '23 at 16:35
Actually, the example may crash, with some tool versions, I have had to insert an `.align` before 'label'. — artless noise, Jun 20 '23 at 12:01
I don't try to do anything useful, it is purely academic. Thanks for the answer. I will post my findings as a separate answer for everyone to benefit. — Alexandru N. Onea, Jun 20 '23 at 16:12

How does the arm-none-eabi-as choose section alignment?

2 Answers2