gcc replaces loops with memcpy and memset

Question

I have the following simple program:

#define N 20
long c[N];
long a[N + N];

void f(void)
{
    long *s = c;
    long *p = a;
    while (p != a + N) *p++ = *s++;
    while (p != a + N + N) *p++ = 0;
}

I compile it with:

/usr/gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-gcc -mthumb -O3 -o main.o -c main.c

gcc conveniently replaces the loops with memcpy and memset respectively:

00000000 <f>:
   0:   b570            push    {r4, r5, r6, lr}
   2:   4d07            ldr     r5, [pc, #28]   ; (20 <f+0x20>)
   4:   4c07            ldr     r4, [pc, #28]   ; (24 <f+0x24>)
   6:   002a            movs    r2, r5
   8:   4907            ldr     r1, [pc, #28]   ; (28 <f+0x28>)
   a:   0020            movs    r0, r4
   c:   f7ff fffe       bl      0 <memcpy>
  10:   1960            adds    r0, r4, r5
  12:   002a            movs    r2, r5
  14:   2100            movs    r1, #0
  16:   f7ff fffe       bl      0 <memset>
  1a:   bc70            pop     {r4, r5, r6}
  1c:   bc01            pop     {r0}
  1e:   4700            bx      r0

Obviously, gcc is smart and decides that library implementation is more efficient, which may or may not be the case in each particular situation. I am wondering how this behavior can be avoided when, for example, speed is not important and library calls are not desirable.

You've explicitly told the compiler to be aggressive with optimisation by using the `-O3` command line option - which is not the default setting. If you want the compiler to be less aggressive, use a different optimisation setting, or even no optimisation setting. — Peter, Oct 29 '17 at 05:07
This is almost a duplicate of https://stackoverflow.com/a/33818680/1162141 — technosaurus, Oct 29 '17 at 05:20
Actually, it is. Unfortunately, I could not find it with 'memcpy loop'. Lots of posts about 'what is faster?' Thanks. — A.K., Oct 29 '17 at 05:27
Possible duplicate of [Getting GCC to compile without inserting call to memcpy](https://stackoverflow.com/questions/6410595/getting-gcc-to-compile-without-inserting-call-to-memcpy) — vgru, Jan 23 '18 at 09:12

score 11 · Answer 1 · edited Jun 20 '20 at 09:12

11

Okay, searching through https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html reveals the following option:

-ftree-loop-distribute-patterns
Perform loop distribution of patterns that can be code generated with calls to a library. This flag is enabled by default at -O3.

Specifying -fno-tree-loop-distribute-patterns avoids touching the standard library without seemingly affecting other optimizations.

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 29 '17 at 05:14

A.K.

839
6
13

-fno-tree-loop-distribute-patterns does not work with Clang unfortunately. Does anyone know a way to solve the same problem with Clang? – cepstr Nov 05 '19 at 12:36

score 0 · Answer 2 · answered Oct 29 '17 at 05:01

0

You are using the flag -O3, it forces the compiler to run all optimization methods available, try a lower value like -O2 or -O.

answered Oct 29 '17 at 05:01

Mr. bug

366
2
11

gcc replaces loops with memcpy and memset

2 Answers2