1

I have been trying to execute a GNU C project on ARM Cortex M3 Processor. The project runs happily on -Og optimisation level , but when I tried increasing the optimisation levels to -O2, -O3 , I encountered bus faults.

The GNU tool chain was "arm-none-eabi V10.3.1"

Tried reading the BFSR register and it suggested it was a PRECISERR & STKERR. The fault was happening in a self implemented memset function and was done because the project didn't require the standard CLibs.

void* memset(void s, int c, size_t len){
 unsigned char *dst; 
 dst = (unsigned char) s;
 while (len > 0) {
     *dst = (unsigned char) c;
      dst++; 
      len--;
 } 
return s; } 

Also after going through the Assembly for this function , noticed that this was completely different for the -Og option (which worked) and -O2/3/s option which crashed.

I am copying the screenshot of the assembly for the two options here.

-Og option -O2/3/s option

Believe its the return from this function which causes the STKERR, And I have seen a BL instruction (in the -O2/O3/O4 option) which might be the root cause as it pushes the next the instruction address in the Link register and a subsequent pop PC's could fail ?

But I was able to get around the issue by a small modification the code and making the variables as volatile. The new implementation below.

void* memset(void *s, int c, size_t len) {
     unsigned char * volatile dst;
     volatile size_t count = 0;
     dst = (unsigned char * volatile) s;

    while (count <  len) {
        dst[count] = (unsigned char) c;
        count++;
    }
    return s;
}

Please wanted to know whether this is a bug in the GNU tool chain ?

The assembly of the problematic memset function here (-O2/-O3/-Os) :-

    .section    .text.memset,"ax",%progbits
    .align  1
    .p2align 2,,3
    .global memset
    .syntax unified
    .thumb
    .thumb_func
    .type   memset, %function
memset:
    .cfi_startproc
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    push    {r4, lr}
    mov r4, r0
    cbz r2, .L34
    uxtb    r1, r1
    bl  memset
    mov r0, r4
    pop {r4, pc}
    .cfi_endproc

Assembly of the memset function compiled with -Og option (which works)

    .section    .text.memset,"ax",%progbits
    .align  1
    .global memset
    .syntax unified
    .thumb
    .thumb_func
    .type   memset, %function
memset:
    .cfi_startproc
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    mov r3, r0
.L20:
    strb    r1, [r3], #1
    subs    r2, r2, #1
    cmp r2, #0
    bne .L20
    bx  lr
    .cfi_endproc
.LFE81:
    .size   memset, .-memset
  • void* memset(void *s, int c, size_t len) { unsigned char *dst; dst = (unsigned char*) s; while (len > 0) { *dst = (unsigned char) c; dst++; len--; } return s; } – user3786632 Nov 09 '22 at 09:03
  • Sorry made a typo in the original post, the original memset code pasted in the comments. – user3786632 Nov 09 '22 at 09:05
  • 2
    Even as a new contributor, you should have the right to edit your own question. So please do so instead of pasting unformatted code as a comment. – Codo Nov 09 '22 at 09:09
  • Thanks I have done that now. – user3786632 Nov 09 '22 at 13:54
  • 1
    failure do to optimization does not automatically mean it is a compiler problem you may have a bug in your code that is amplified by the compiler or you may have a race condition in your code that is made worse by optimized code. – old_timer Nov 09 '22 at 14:06
  • 1
    post text not images and if you think it is compiler then what did your debug of the compiler output show, you need to do some debug. – old_timer Nov 09 '22 at 14:06
  • what is the disassembly of memset, perhaps it is as simple as an unligned address with some memset optimization that expects an alignment. basically we need to see a complete example, not fragments, and what debug you have done. – old_timer Nov 09 '22 at 14:11
  • or are you implying that the compiler generated recursion that does not actually, functionally, implement the high level language code? – old_timer Nov 09 '22 at 14:13
  • 1
    Why are you writing your own `memset` instead of using the stdlib `memset`? Because what's happening in the failing code is the compiler is detecting what you're doing and replacing it with a call to the highly optimized library version. Or at least attempting to. – Mgetz Nov 09 '22 at 14:20
  • also what tool is this, debuggers we have seen can show the wrong stuff related to the C code as some inline thing, instead use a proper disassembler to show the output – old_timer Nov 09 '22 at 19:19
  • I am pasting the actual disassembly of the problematic code. @old_timer, This is a single threaded application so race conditions are out of the question . And for the concern around unaligned address , surely, casting the passed in address to a void* and casting it back to a char*, should take care of the addressing ? The snippet of the compiler generated assembly code – user3786632 Nov 10 '22 at 08:02
  • Assembly code of the function which was causing issue added in the query – user3786632 Nov 10 '22 at 08:16
  • @old_timer, the debugger is a Trace-32 device. – user3786632 Nov 10 '22 at 08:25
  • @ Mgetz, Yes this project had a requirement that it shouldn't include any standard libraries. So in the linker options we have used -nostartfiles -nodefaultlibs -nostdlib options, so there is minimal chance of the the code jumping into an equivalent library version. Actually, if I comment out memset, i am getting a linker error, saying no implementation of memset is present. – user3786632 Nov 10 '22 at 08:30
  • If you work with bigger structs, the compiler will generate call to memset() (for initialization) and to memcpy() (when copying structs). – Codo Nov 10 '22 at 08:39
  • You should be able to get around this by passing `-fno-tree-loop-distribute-patterns` to the compiler (possibly with an attribute. see https://stackoverflow.com/a/33818680 ) – Hasturkun Nov 10 '22 at 15:40
  • rename your memset with another name like xmemset, looks like it is detecting and optimizing your code to call the real memset, but yours is called memset so it now because an infinite recursion until it crashes – old_timer Nov 10 '22 at 15:40

2 Answers2

2

Even with -ffreestanding, -nostdlib and similar options, there are certain "library functions" that GCC will nonetheless emit calls to, when it finds that it can optimize code into a call to one of those functions. memset is one of them. So GCC optimizes your memset function into a call to memset, which of course causes infinite recursion and a crash.

This is documented at https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Standards.html#Standards:

Most of the compiler support routines used by GCC are present in libgcc, but there are a few exceptions. GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. Finally, if __builtin_trap is used, and the target does not implement the trap pattern, then GCC emits a call to abort.

The expectation is that you will implement those functions in assembly language, and that you'd probably want to do so anyway so that you can use hand-optimized versions. As you noticed, you can get away with doing it in C by using tricks like volatile to suppress the optimization, but that's going to result in the most naive and least optimized version of memset and you probably won't be very happy with the performance.

Getting GCC to compile without inserting call to memcpy also suggests using -fno-tree-loop-distribute-patterns. Combined with -O3 it uses 16-byte SIMD stores instead of single-byte strb, so that's a considerable improvement. https://godbolt.org/z/K8645on1e. Some future compiler version may break this, however.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • re: code output for the version with `-fno-tree-loop-distribute-patterns`, you left the `volatile` in there, but it also only gives an unrolled/write combining version at `-O3`, https://godbolt.org/z/K8645on1e – Hasturkun Nov 11 '22 at 07:55
  • @Hasturkun: Oops, good catch. I updated the answer accordingly. – Nate Eldredge Nov 11 '22 at 14:38
1

So we basically answered this yesterday in comments. And then Nate answered it with how to get around it. That or just do not use -O3 use -O2 (at least on the gnu I was using). In general do not use -O3.

typedef __SIZE_TYPE__ size_t;
void* memset(void *s, int c, size_t len)
{
 unsigned char *dst; 
 dst = (unsigned char *)s;
 while (len > 0) {
     *dst = (unsigned char) c;
      dst++; 
      len--;
 } 
 return s; 
} 



00000000 <memset>:
   0:   b510        push    {r4, lr}
   2:   4604        mov r4, r0
   4:   b112        cbz r2, c <memset+0xc>
   6:   b2c9        uxtb    r1, r1
   8:   f7ff fffe   bl  0 <memset>
   c:   4620        mov r0, r4
   e:   bd10        pop {r4, pc}



typedef __SIZE_TYPE__ size_t;
void* fun(void *s, int c, size_t len)
{
 unsigned char *dst; 
 dst = (unsigned char *)s;
 while (len > 0) {
     *dst = (unsigned char) c;
      dst++; 
      len--;
 } 
 return s; 
} 


00000000 <fun>:
   0:   b510        push    {r4, lr}
   2:   4604        mov r4, r0
   4:   b112        cbz r2, c <fun+0xc>
   6:   b2c9        uxtb    r1, r1
   8:   f7ff fffe   bl  0 <memset>
   c:   4620        mov r0, r4
   e:   bd10        pop {r4, pc}
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • writing your own C library functions like these are a very good exercise and definitely not worth sucking in all the baggage that comes with a C library to get a few or these functions. I recommend if not doing a full C library then make them your own name my_memset() my_memcpy(), etc. – old_timer Nov 10 '22 at 16:01
  • you will also see things like some struct variables a and b an a=b in the code causing an memcpy to be inserted and you need to tell the tool not to use stdlib calls. See Nate's answer, which is the true answer here. – old_timer Nov 10 '22 at 16:02
  • `-O2` doesn't avoid the problem, it still calls `memset` recursively. https://godbolt.org/z/191ssrqd4. You have to go down to `-O1`. – Nate Eldredge Nov 10 '22 at 19:58
  • Thanks All, That explains it. I will close this thread. – user3786632 Nov 11 '22 at 07:56