1

I built CoreMark for Aarch64 using aarch64-none-elf-gcc with the following options:

-mcpu=cortex-a57 -Wall -Wextra -g -O2

In disassembled code I see many NOPs.

A few examples:

0000000040001540 <matrix_mul_const>:
    40001540:   13003c63    sxth    w3, w3
    40001544:   34000240    cbz w0, 4000158c <matrix_mul_const+0x4c>
    40001548:   2a0003e6    mov w6, w0
    4000154c:   52800007    mov w7, #0x0                    // #0
    40001550:   52800008    mov w8, #0x0                    // #0
    40001554:   d503201f    nop
    40001558:   2a0703e4    mov w4, w7
    4000155c:   d503201f    nop
    40001560:   78e45845    ldrsh   w5, [x2, w4, uxtw #1]
    ...

00000000400013a0 <core_init_matrix>:
    400013a0:   7100005f    cmp w2, #0x0
    400013a4:   2a0003e6    mov w6, w0
    400013a8:   1a9f1442    csinc   w2, w2, wzr, ne // ne = any
    400013ac:   52800004    mov w4, #0x0                    // #0
    400013b0:   34000620    cbz w0, 40001474 <core_init_matrix+0xd4>
    400013b4:   d503201f    nop
    400013b8:   2a0403e0    mov w0, w4
    400013bc:   11000484    add w4, w4, #0x1

A simple question: what these NOPs are used for?


UPD. Yes, it is related to alignment. Here is the corresponding generated assembly code:

matrix_mul_const:
.LVL41:
.LFB4:
        .loc 1 270 1 is_stmt 1 view -0
        .cfi_startproc
        .loc 1 271 5 view .LVU127
        .loc 1 272 5 view .LVU128
        .loc 1 272 19 view .LVU129
        .loc 1 270 1 is_stmt 0 view .LVU130
        sxth    w3, w3
        .loc 1 272 19 view .LVU131
        cbz     w0, .L25
        .loc 1 276 51 view .LVU132
        mov     w6, w0
        mov     w7, 0
        .loc 1 272 12 view .LVU133
        mov     w8, 0
.LVL42:
        .p2align 3,,7
.L27:
        .loc 1 274 23 is_stmt 1 view .LVU134
        .loc 1 270 1 is_stmt 0 view .LVU135
        mov     w4, w7
.LVL43:
        .p2align 3,,7
.L28:
        .loc 1 276 13 is_stmt 1 discriminator 3 view .LVU136
        .loc 1 276 28 is_stmt 0 discriminator 3 view .LVU137
        ldrsh   w5, [x2, w4, uxtw 1]

Here we see .p2align 3,,7. These .p2align xxx are result of -O2:

$ aarch64-none-elf-gcc -Wall -Wextra -g -O1 -ffreestanding -c core_matrix.c -S ;\
  grep '.p2align' core_matrix.s | sort | uniq
<nothing>

$ aarch64-none-elf-gcc -Wall -Wextra -g -O2 -ffreestanding -c core_matrix.c -S ;\
  grep '.p2align' core_matrix.s | sort | uniq
        .p2align 2,,3
        .p2align 3,,7
        .p2align 4,,11
pmor
  • 5,392
  • 4
  • 17
  • 36
  • Can we please have the source code for those functions? – Nate Eldredge Jan 30 '23 at 17:24
  • are they really nops or the disassembler doesnt know what they are, seems like a lot of bits to describe a nop.. would expect more zeros with non zeros up front – old_timer Jan 30 '23 at 19:37
  • hmmm, disassembles as a nop for me too... – old_timer Jan 30 '23 at 19:39
  • from the docs though it looks like a msr, why would one walk through the encoding of an msr that deeply to then land in the otherwise UnallocatedEncoding();? op1 of 011 which is one of the two cases used but op2 is 000 which is not used (or is it a latter spec?). and does unallocated then mean nop or undefined? doc indicateds UndefinedFault from how I read it. everyone else see what I am seeing? – old_timer Jan 30 '23 at 19:57
  • It looks right to me, the genuine `NOP` instruction. You're right that the encoding is kind of unexpected. It's in the "Hints" class, under the group of "Branches, Exception Generating and System instructions". It is kind of a catch-all group that also includes system instructions, but "Hints" are distinguished from "System register move" by bit 20. An msr/mrs would have a 1 in bit 20. – Nate Eldredge Jan 30 '23 at 22:02
  • 1
    All the `nop`s are placed so that the next instruction is aligned to 8 bytes. Check your compile option. Try removing the `-mcpu` first. – Jake 'Alquimista' LEE Jan 31 '23 at 01:44
  • @Jake'Alquimista'LEE: Oh, good spot. So for instance, is 0x40001558 a branch target? – Nate Eldredge Jan 31 '23 at 04:49
  • @NateEldredge maybe. I noticed that compilers are placing some `nop`s for alignment, especially if you specfy target micro architecture. Maybe it's pipeline related. – Jake 'Alquimista' LEE Jan 31 '23 at 04:54
  • @NateEldredge Source code: https://github.com/eembc/coremark. – pmor Jan 31 '23 at 13:41
  • @Jake'Alquimista'LEE Yes, it is related to alignment. See UPD. – pmor Jan 31 '23 at 13:46
  • @pmor can you please post this as answer? – Étienne Jun 02 '23 at 09:09

1 Answers1

0

A simple question: what these NOPs are used for?

These NOPs are result of optimization (see below) and are used to align the next instruction to 8 bytes.

Thanks to user Jake 'Alquimista' LEE.

-O1 leads to no .p2align xxx:

$ aarch64-none-elf-gcc -Wall -Wextra -g -O1 -ffreestanding -c core_matrix.c -S ;\
  grep '.p2align' core_matrix.s | sort | uniq
<nothing>

-O2 leads to .p2align xxx:

$ aarch64-none-elf-gcc -Wall -Wextra -g -O2 -ffreestanding -c core_matrix.c -S ;\
  grep '.p2align' core_matrix.s | sort | uniq
        .p2align 2,,3
        .p2align 3,,7
        .p2align 4,,11

Note: the core_matrix.c is part of coremark.

pmor
  • 5,392
  • 4
  • 17
  • 36