15

Consider the following code:

void MemMove8(void* dst, void* src)
{
    char tmp[8];
    memcpy(tmp, src, 8);
    memcpy(dst, tmp, 8);
}

MSVC (16.7.1) x86 with /O2 generates the following assembly for this function:

; _dst$ = ecx
; _src$ = edx
    mov eax, DWORD PTR [edx]
    mov DWORD PTR [ecx], eax
    mov eax, DWORD PTR [edx+4]
    mov DWORD PTR [ecx+4], eax

But this doesn't work (in some cases) if the input and output buffers overlap.

The generated code seems wrong to me; or is this a valid transformation and I am missing something here?

Alex
  • 709
  • 3
  • 10
  • Does it have any effect to change 2nd arg. to `const void *src`? Maybe, it helps to stop the compiler being overly optimistic if it's explicitly "prohibited" to change the contents of `src`. – Scheff's Cat Aug 16 '20 at 10:49
  • No, it doesn't have any effect. – Alex Aug 16 '20 at 10:54
  • 2
    Yes, but that's why the function first copies the contents from `src` into the temporary buffer `tmp`. Then from `tmp` to `dst`. The temporary buffer does not overlap with either `src` or `dst`. – Alex Aug 16 '20 at 11:09
  • Are you `#include ` and `using std::memcpy;` or are you using the `::memcpy` (in the global namespace, from ``)? I don't have access to MSVC 16.7.1 for x86 target. – Eljay Aug 16 '20 at 11:19
  • I should have posted compilable code :-) I do `#include ` and use `memcpy` from the global namespace. But it doesn't really matter all combinations result in the same assembly. – Alex Aug 16 '20 at 11:26
  • 2
    I presume switching to `std::memmove` fixes the problem? (I wish godbolt had MSVC 16.7.1 x86.) On my platform, `::memcpy` and `std::memcpy` are defined differently (but that's not MSVC 16.7.1). – Eljay Aug 16 '20 at 11:32
  • I think you found a *bona fide* compiler bug. Congratulations! Can you reproduce with MSVC 19.24? If so, file a bug report. If not, the bug has since been fixed. – Eljay Aug 16 '20 at 11:42
  • 1
    MSVC just imports the `::memcpy` function into the std namespace. So `memcpy` and `std::memcpy` are actually the same. The `MemMove8` function exists only because MSVC does not "inline" `memmove(dst, src, 8)` (calls a library function instead of generating some `mov` instructions) but it does "inline" `std::memcpy(dst, src, 8)`. It's just a performance optimization. – Alex Aug 16 '20 at 11:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/219901/discussion-between-eljay-and-alex). – Eljay Aug 16 '20 at 11:42
  • 2
    Yup, looks like a bug. But the fact that you even need this workaround at all means it's time to find a better compiler (e.g. clang) that can inline memmove in the first place, and does the loads first before either store even when inlining memcpy, if it has enough registers: https://godbolt.org/z/GGTcsq – Peter Cordes Aug 16 '20 at 13:03
  • Agreed. 100%. It's a _little_ bit annoying. – Alex Aug 16 '20 at 13:20

1 Answers1

5

It's a bug.

https://developercommunity.visualstudio.com/content/problem/1151407/incorrect-memcpy-optimization.html

This seems to have been fixed in VS 16.8.

Alex
  • 709
  • 3
  • 10