how to force arm gcc compiler to not to use 32bit access for an unaligned memory

Question

I work on a memory which cannot handle 32bit access on an unaligned address. For unaligned addresses the memory supports 8bit level access.

In my code there is a memcpy, when I pass a unaligned address to memcpy the chip was getting stuck. Upon looking deeper I figured out the generated assembly code of memcpy is doing a 32bit access to the address regardless of whether the given address is aligned to 32bit or not. When I reduced the optimization level to O(2) then the compiler generates code which always do a 8bit access.

[Edit] : Below is the memcpy code which I am using

void* memcpy(void * restrict s1, const void * restrict s2, size_t n)
{
    char* ll = (char*)s1;
    char* rr = (char*)s2;
    for (size_t i = 0; i < n; i++) ll[i] = rr[i];
    return s1;
}

Below is the disassembly of the code

void* memcpy3(void *s1, void *s2, size_t n)
{
char* ll = (char*)s1;
char* rr = (char*)s2;
for (size_t i = 0; i < n; i++) ll[i] = rr[i];
0:  b38a        cbz r2, 66 <memcpy3+0x66>
{
2:  b4f0        push    {r4, r5, r6, r7}
4:  1d03        adds    r3, r0, #4
6:  1d0c        adds    r4, r1, #4
8:  42a0        cmp r0, r4
a:  bf38        it  cc
c:  4299        cmpcc   r1, r3
e:  d31e        bcc.n   4e <memcpy3+0x4e>
10: 2a08        cmp r2, #8
12: d91c        bls.n   4e <memcpy3+0x4e>
14: 460d        mov r5, r1
16: 4604        mov r4, r0
  for (size_t i = 0; i < n; i++) ll[i] = rr[i];
18: 2300        movs    r3, #0
1a: 0897        lsrs    r7, r2, #2
1c: f855 6b04   ldr.w   r6, [r5], #4
20: 3301        adds    r3, #1
22: 429f        cmp r7, r3
24: f844 6b04   str.w   r6, [r4], #4
28: d8f8        bhi.n   1c <memcpy3+0x1c>
2a: f022 0303   bic.w   r3, r2, #3
2e: 429a        cmp r2, r3
30: d00b        beq.n   4a <memcpy3+0x4a>
32: 56cd        ldrsb   r5, [r1, r3]
34: 1c5c        adds    r4, r3, #1
36: 42a2        cmp r2, r4
38: 54c5        strb    r5, [r0, r3]
3a: d906        bls.n   4a <memcpy3+0x4a>
3c: 570d        ldrsb   r5, [r1, r4]
3e: 3302        adds    r3, #2
40: 429a        cmp r2, r3
42: 5505        strb    r5, [r0, r4]
44: d901        bls.n   4a <memcpy3+0x4a>
46: 56ca        ldrsb   r2, [r1, r3]
48: 54c2        strb    r2, [r0, r3]
  return s1;
}
4a: bcf0        pop {r4, r5, r6, r7}
4c: 4770        bx  lr
4e: 3a01        subs    r2, #1
50: 440a        add r2, r1
52: 1e43        subs    r3, r0, #1
54: 3901        subs    r1, #1
  for (size_t i = 0; i < n; i++) ll[i] = rr[i];
56: f911 4f01   ldrsb.w r4, [r1, #1]!
5a: 4291        cmp r1, r2
5c: f803 4f01   strb.w  r4, [r3, #1]!
60: d1f9        bne.n   56 <memcpy3+0x56>
}
62: bcf0        pop {r4, r5, r6, r7}
64: 4770        bx  lr
66: 4770        bx  lr

Is it possible to configure the arm-gcc compiler to not to use a 32bit access on an unaligned address.

Maybe you should show us the code? BTW: memcpy() should work (on non-overlapping objects) — wildplasser, Jun 20 '21 at 09:08
I assume this is because it's memory mapped I/O? In that case, use `volatile` pointer variables, with the right granularity. — MicroVirus, Jun 20 '21 at 09:09
Do you compile glibc? memcpy does not change its behaviour depending on the flags. — 0___________, Jun 20 '21 at 09:39
show this memcpy as memcpy always first copy bytes until aligned, Then copy native size, then bytes at the end. — 0___________, Jun 20 '21 at 10:03
Are you compiling your code with `-ffreestanding` flag? If not, `memcpy` can be replaced by the compiler with the builtin `memcpy` from the standard library. — Alex Lop., Jun 20 '21 at 10:26
@AlexLop. Yes I made sure that its using the function which I wrote. I tried renaming the function to memcpy_new and used that one and I am still facing this issue. — 0xAB1E, Jun 20 '21 at 10:28
Can you show the disassembly of this function after you build it? Because compilers these days are smart. They know to detect `memcpy` pattern and just replace the code with the STD `memcpy` like here: https://godbolt.org/z/d6vKd8r3n — Alex Lop., Jun 20 '21 at 10:36
Can you compile your code with the `restrict` like in the C example and show the updated assembly code? There is some assembly code which assumes potential overlap of memory. But besides this I see that the compiler uses `ldrb` and `strb` which are loads and stores of single byte. — Alex Lop., Jun 20 '21 at 11:01
To provide a [mre], show the exact flags you use to compile and complete source code so that the generated assembly can be reproduced on [Compiler Explorer](https://godbolt.org/z/4xxqcK1bT). The code currently shown in the question does not compile because `size_t` is not defined. While it may seem nitpicky to require `#include ` to be shown, it is necessary to ensure the problem is reproduced exactly, and it is simple to do. — Eric Postpischil, Jun 20 '21 at 11:13
this is not a compiler thing, gcc has nothing to do with memcpy, that is a C library (glibc or whatever library you have chosen to use). You are directly or indirectly telling the linker what library/objects to link. — old_timer, Jun 20 '21 at 12:16
@old_timer: OP is compiling `memcpy` or an equivalent routine. — Eric Postpischil, Jun 20 '21 at 13:21
OP is ultimately in control of the linker, directly or indirectly. — old_timer, Jun 20 '21 at 17:39
@old_timer: Which is not relevant. OP’s problem is not in the code in the library. The problem they report is the compiler generating aligned load/store instructions when `memcpy` or an equivalent routine is compiled. — Eric Postpischil, Jun 20 '21 at 17:46

0xAB1E · Accepted Answer · 2021-06-22T11:00:17.887

0

Use -mno-unaligned-access flag to tell the compiler to not to use unaligned access. By default the compiler uses -munaligned-access.

edited Jun 22 '21 at 11:00

answered Jun 21 '21 at 06:18

0xAB1E

721
10
27

It would be great if you can add some more information here like what arch you are using. I am asking this because, the compiler sets march to be armv4t by default and doesn't set the unaligned-access flag (probably because armv4t doesn't support it). If I change it and test with armv7 like `arm-none-eabi-gcc -Q -march=armv7 --help=target` it shows unaligned-access flag to be enabled. – Rajnesh Jun 23 '21 at 11:17

score -1 · Answer 2 · answered Aug 16 '21 at 13:58

-1

Use the "-mcpu=" flag to set the processor type, not "-march=", as it covers more of the options.

The processor determines whether unaligned accesses are allowed to the bus interface. But an unaligned access will be translated into smaller parts before the access to the memory device. It really doesn't make sense to say that a memory does not support unaligned accesses, as it never sees them regardless of what the core does.

answered Aug 16 '21 at 13:58

David

132
3

I do not think this answers the question. Judging by the accepted answer, the compiler was generating unaligned accesses by default and had to be told not to. Changing to use `-mcpu=` would tell the compiler it could use a specific instruction set, perhaps one larger than it uses when only told the target architecture, but why would that change its use of unaligned loads and stores? If the processor is a member of the architecture, then targeting the architecture should not allow unaligned loads and stores if any member in it does not support them. – Eric Postpischil Aug 16 '21 at 23:07
As a general point, within the same processor architecture family, different members may have different details of supported features. If you know exactly which device you are targeting - and for embedded development, that is the norm - you should tell the compiler as accurately as possible. For many targets, including ARM, that means using the -mcpu flag. This gives better results than -march, without the inconvenience of also needing -mtune and other details. – David Aug 17 '21 at 13:42

how to force arm gcc compiler to not to use 32bit access for an unaligned memory

2 Answers2