Odd memcpy_s behaviour in VS2015

Question

recently I was profiling one application, and I have noticed that memcpy_s assembly implementation behaves strangely. I'm talking about implementation residing in Microsoft Visual Studio 14.0\VC\crt\src\i386\memcpy.asm I'm reaching the CopyUpLargeMov: then I expect it to choose the SSE2 path, or use any other available optimized implementation. the code as following:

    CopyUpLargeMov:
        bt      __favor, __FAVOR_ENFSTRG        ; check if Enhanced Fast Strings is supported
        jnc     CopyUpSSE2Check                 ; if not, check for SSE2 support
        rep     movsb
        mov     eax,[esp + 0Ch]                 ; return original destination pointer
        pop     esi
        pop     edi
        M_EXIT

Whatever I do with optimization tweaking it never reaches CopyUpSSE2Check.
Tested with Release|Win32, VS2015 Upd3, Windows10 x64.

The actual C++ code

std::vector<uint8_t> src(1024*1024*20,0);
std::vector<uint8_t> dst(1024*1024*20,0);
for (auto i = 0ul; i < 1000; ++i)
{
    memcpy_s(dst.data(), dst.size(), src.data(), src.size());
}

Any ideas?

EDIT001:
It seems that x64 does not exhibits the strange behavior, it falls into Enhanced Fast Strings optimization part of the code. Maybe the above a x86 limitation?

[OT] Just a FYI: MSVS has done some pretty good work with optimizing `vector` and if the data is a POD type it should be using `memxxx` functions internally. I would think `dst = src` would be just as good here and maybe better. — NathanOliver, Jan 16 '17 at 15:36
I took vector just for convenience, in real code it is `uint8_t*` to `uint8_t*`, but vector is good enough for memcpy_s to exhibit the same odd behaviour — kreuzerkrieg, Jan 16 '17 at 15:43
Note that 64 bit is always guaranteed to have SSE2 as part of the architecture, so no checks needed. — Jester, Jan 16 '17 at 15:45
Does your cpu have Intel's fast string operations? If it does, the `rep movsb` may be *faster* than SSE2. — EOF, Jan 16 '17 at 15:52
You noticed that it doesn't check, I just pointed out **why** it doesn't need to. It wasn't clear whether you knew that or not. — Jester, Jan 16 '17 at 15:57
@EOF The latest and the greatest i7, so it is - actually it does use EFS for x64, the question is why is the x86 so in-optimized? — kreuzerkrieg, Jan 16 '17 at 17:34
@Jester, got your point. however, it is checks for something, except EFS, but dont remember exactly, will check it tomorrow — kreuzerkrieg, Jan 16 '17 at 17:44
@kreuzerkrieg: It's **not unoptimized**. The function tests for *fast hardware string copy*. If the hardware does *not* support fast `rep movsb`, it *falls back to SSE*. — EOF, Jan 16 '17 at 18:18
@EOF, you pinpoint the problem of my lagging knowledge of modern assembly, I didn't know what `rep movsb` means, actually, it is the EFS, and everything works as expected. So I just got it wrong. Would you like to convert your comment to reply so I can mark it as answer? — kreuzerkrieg, Jan 16 '17 at 18:54
@Jester, for protocol, x64 `memcpy` checks as following, first try EFS then if not available go for SSE, without check, as you said. — kreuzerkrieg, Jan 17 '17 at 06:31

score 2 · Answer 1 · answered Jan 17 '17 at 06:37

2

As @EOF pointed out in his comment, the rep movsb is the optimization. It moves the data from string to string, so called "enhanced fast strings" optimization. So I just overlooked it, the memcpy is working as it expected to work.

answered Jan 17 '17 at 06:37

kreuzerkrieg

3,009
3
28
59

Odd memcpy_s behaviour in VS2015

1 Answers1