13

I'm currently using GCC 4.5.3, compiled for PowerPC 440, and am compiling some code that doesn't require libc. I don't have any direct calls to memcpy(), but the compiler seems to be inserting one during the build.

There are linker options like -nostdlib, -nostartfiles, -nodefaultlibs but I'm unable to use them as I'm not doing the linking phase. I'm only compiling. With something like this:

$ powerpc-440-eabi-gcc -O2 -g -c -o output.o input.c

If I check the output.o with nm, I see a reference to memcpy:

$ powerpc-440-eabi-nm output.o | grep memcpy
     U memcpy
$ 

The GCC man page makes it clear how to remove calls to memcpy and other libc calls with the linker, but I don't want the compiler to insert them in the first place, as I'm using a completely different linker (not GNU's ld, and it doesn't know about libc).

Thanks for any help you can provide.

Brian
  • 133
  • 1
  • 1
  • 4
  • If nothing else works, a simple byte-by-byte, CPU-based implementation of memcpy sufficient at least for rarely-used cases is likely shorter than most of the answers posted here. – Chris Stratton Feb 18 '14 at 22:40

5 Answers5

9

There is no need to -fno-builtins or -ffreestanding as they will unnecessarily disable many important optimizations

This is actually "optimized" by gcc's tree-loop-distribute-patterns, so to disable the unwanted behavior while keeping the useful builtin capabilities, you can just use:

-fno-tree-loop-distribute-patterns

Musl-libc uses this flag for its build and has the following note in their configure script (I looked through the source and didn't find any macros, so this should be enough)

# Check for options that may be needed to prevent the compiler from
# generating self-referential versions of memcpy,, memmove, memcmp,
# and memset. Really, we should add a check to determine if this
# option is sufficient, and if not, add a macro to cripple these
# functions with volatile...
# tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns

You can also add this as an attribute to individual functions in gcc using its optimize attribute, so that other functions can benefit from calling mem*()

__attribute__((optimize("no-tree-loop-distribute-patterns")))
size_t strlen(const char *s){ //without attribute, gcc compiles to jmp strlen
  size_t i = -1ull;
  do { ++i; } while (s[i]);
  return i;
}

Alternatively, (at least for now) you may add a confounding null asm statement into your loop to thwart the pattern recognition.

size_t strlen(const char *s){
    size_t i = -1ull;
    do {
        ++i;
        asm("");
    } while (s[i]) ;
    return i;
}
technosaurus
  • 7,676
  • 1
  • 30
  • 52
  • Thanks, it does make the unwanted `mem*()` disappear. Your suggestion to use function attribute also works, so I'm just popping in here an example `void __attribute__((optimize("-fno-tree-loop-distribute-patterns"))) init_bss(void) { \* low-level .bss init function *\}`. Also, I must add that this optimization is only turned on at `O3` as far as I can tell. – davidanderle Mar 31 '19 at 12:46
6

Gcc emits call to memcpy in some circumstance, for example if you are copying a structure. There is no way to change GCC behaviour but you can try to avoid this by modifying your code to avoid such copy. Best bet is to look at the assembly to figure out why gcc emitted the memcpy and try to work around it. This is going to be annoying though, since you basically need to understand how gcc works.

Extract from http://gcc.gnu.org/onlinedocs/gcc/Standards.html:

Most of the compiler support routines used by GCC are present in libgcc, but there are a few exceptions. GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. Finally, if __builtin_trap is used, and the target does not implement the trap pattern, then GCC will emit a call to abort.

Droopycom
  • 1,831
  • 1
  • 17
  • 20
  • We noticed gcc resorted to memcpy when we initialized a local array: int x[4] = {1,2,3,4}; Presumably gcc had to copy this data on the local stack each time we entered the function. So changing it to a static definition made the problem go away. static const int x[4] = {1,2,3,4}; – Phil Hord Apr 11 '12 at 18:35
  • 1
    Yes, but you also changed the meaning of the code: i.e. if your function is changing x, then it won't be {1,2,3,4} at the beginning of the next call... – Droopycom Nov 13 '13 at 01:00
  • 1
    Correct, but in this case we promise not to change the data. We ensure that promise by declaring it as `const int` array. And yes, we could shoot ourselves by casting away const, but we take gun safety very seriously here and do not entertain such foolishness. The remaining issue might be that our array is now in .data instead of on the local stack. But this also is not a concern for our usage. YMMV. – Phil Hord Feb 18 '14 at 22:01
5

You need to disable a that optimization with -fno-builtin. I had this problem once when trying to compile memcpy for a C library. It called itself. Oops!

Richard Pennington
  • 19,673
  • 4
  • 43
  • 72
  • 2
    Thanks for your reply! After reading through the manual, it does seem like -fno-builtin or -ffreestanding should do exactly what I need. But after adding those compiler switches, I'm still getting two references to memcpy. Any other suggestions? – Brian Jun 21 '11 at 03:00
  • @Brian: I've hit the exact same problem. Is this just a GCC bug? – R.. GitHub STOP HELPING ICE Jul 28 '13 at 03:11
  • 1
    Yeah, unfortunately, this is not the correct answer: there are cases, when the `mem*` calls are still inserted, no matter what. See Droopycom's answer, that's the real one! – Sz. Oct 29 '15 at 23:06
  • 1
    @R..: "Is this just a GCC bug?" Should be, but it seems to have been made a feature instead. However, the smart recursive memcpy Richard mentioned is, indeed, a bug (or at least has an open issue): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 – Sz. Oct 29 '15 at 23:19
3

You can also make your binary a "freestanding" one:

The ISO C standard defines (in clause 4) two classes of conforming implementation. A conforming hosted implementation supports the whole standard [...]; a conforming freestanding implementation is only required to provide certain library facilities: those in , , , and ; since AMD1, also those in ; and in C99, also those in and . [...].

The standard also defines two environments for programs, a freestanding environment, required of all implementations and which may not have library facilities beyond those required of freestanding implementations, where the handling of program startup and termination are implementation-defined, and a hosted environment, which is not required, in which all the library facilities are provided and startup is through a function int main (void) or int main (int, char *[]).

An OS kernel would be a freestanding environment; a program using the facilities of an operating system would normally be in a hosted implementation.

(paragraph added by me)

More here. And the corresponding gcc option/s (keywords -ffreestanding or -fno-builtin) can be found here.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • These options don't prevent gcc from emitting calls to memcpy altogether, but they do seem to help prevent it from compiling a memcpy implementation into a call to itself. – Nate Eldredge Apr 07 '22 at 14:49
1

This is quite an old question, but I've hit the same issue, and none of the solutions here worked.

So I defined this function:

static __attribute__((always_inline)) inline void* imemcpy (void *dest, const void *src, size_t len) {
  char *d = dest;
  const char *s = src;
  while (len--)
    *d++ = *s++;
  return dest;
}

And then used it instead of memcpy. This has solved the inlining issue for me permanently. Not very useful if you are compiling some sort of library though.

projix
  • 11
  • 1