8

I have the following simple program:

#define N 20
long c[N];
long a[N + N];

void f(void)
{
    long *s = c;
    long *p = a;
    while (p != a + N) *p++ = *s++;
    while (p != a + N + N) *p++ = 0;
}

I compile it with:

/usr/gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-gcc -mthumb -O3 -o main.o -c main.c

gcc conveniently replaces the loops with memcpy and memset respectively:

00000000 <f>:
   0:   b570            push    {r4, r5, r6, lr}
   2:   4d07            ldr     r5, [pc, #28]   ; (20 <f+0x20>)
   4:   4c07            ldr     r4, [pc, #28]   ; (24 <f+0x24>)
   6:   002a            movs    r2, r5
   8:   4907            ldr     r1, [pc, #28]   ; (28 <f+0x28>)
   a:   0020            movs    r0, r4
   c:   f7ff fffe       bl      0 <memcpy>
  10:   1960            adds    r0, r4, r5
  12:   002a            movs    r2, r5
  14:   2100            movs    r1, #0
  16:   f7ff fffe       bl      0 <memset>
  1a:   bc70            pop     {r4, r5, r6}
  1c:   bc01            pop     {r0}
  1e:   4700            bx      r0

Obviously, gcc is smart and decides that library implementation is more efficient, which may or may not be the case in each particular situation. I am wondering how this behavior can be avoided when, for example, speed is not important and library calls are not desirable.

A.K.
  • 839
  • 6
  • 13
  • 1
    You've explicitly told the compiler to be aggressive with optimisation by using the `-O3` command line option - which is not the default setting. If you want the compiler to be less aggressive, use a different optimisation setting, or even no optimisation setting. – Peter Oct 29 '17 at 05:07
  • This is almost a duplicate of https://stackoverflow.com/a/33818680/1162141 – technosaurus Oct 29 '17 at 05:20
  • 1
    Actually, it is. Unfortunately, I could not find it with 'memcpy loop'. Lots of posts about 'what is faster?' Thanks. – A.K. Oct 29 '17 at 05:27
  • Possible duplicate of [Getting GCC to compile without inserting call to memcpy](https://stackoverflow.com/questions/6410595/getting-gcc-to-compile-without-inserting-call-to-memcpy) – vgru Jan 23 '18 at 09:12

2 Answers2

11

Okay, searching through https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html reveals the following option:

-ftree-loop-distribute-patterns

Perform loop distribution of patterns that can be code generated with calls to a library. This flag is enabled by default at -O3.

Specifying -fno-tree-loop-distribute-patterns avoids touching the standard library without seemingly affecting other optimizations.

Community
  • 1
  • 1
A.K.
  • 839
  • 6
  • 13
  • -fno-tree-loop-distribute-patterns does not work with Clang unfortunately. Does anyone know a way to solve the same problem with Clang? – cepstr Nov 05 '19 at 12:36
0

You are using the flag -O3, it forces the compiler to run all optimization methods available, try a lower value like -O2 or -O.

Mr. bug
  • 366
  • 2
  • 11