A curious string copy function in C

Question

When I was reading the nginx code, I have seen this function :

#define ngx_cpymem(dst, src, n)   (((u_char *) memcpy(dst, src, n)) + (n))

static ngx_inline u_char *
ngx_copy(u_char *dst, u_char *src, size_t len)
{
    if (len < 17) {

        while (len) {
            *dst++ = *src++;
            len--;
        }

        return dst;

    } else {
        return ngx_cpymem(dst, src, len);
    }
}

It's a simple string copy function. But why it tests the length of string and switch to memcpy if the length is >= 17 ?

It's not a string copy function really, it is a memory copy function. — JeremyP, Apr 13 '11 at 10:11
Somebody should suggest to the nginx devs to use a Duff's device here... :) — CAFxX, Jul 20 '12 at 05:10

osgx · Accepted Answer · 2012-07-20T04:28:09.480

It is an optimization - for very small strings simple copy is faster than calling a system (libc) copy function.

Simple copy with while loop works rather fast for short strings, and system copy function have (usually) optimizations for long strings. But also system copy does a lot of checks and some setup.

Actually, there is a comment by author just before this code: nginx, /src/core/ngx_string.h (search ngx_copy)

/*
 * the simple inline cycle copies the variable length strings up to 16
 * bytes faster than icc8 autodetecting _intel_fast_memcpy()
 */

Also, a two line upper is

#if ( __INTEL_COMPILER >= 800 )

So, author did measurements and conclude that ICC optimized memcopy do a long CPU check to select a most optimized memcopy variant. He found that copying 16 bytes by hand is faster than fastest memcpy code from ICC.

For other compilers nginx does use ngx_cpymem (memcpy) directly

#define ngx_copy                  ngx_cpymem

Author did a study of different memcpys for different sizes:

/*
 * gcc3, msvc, and icc7 compile memcpy() to the inline "rep movs".
 * gcc3 compiles memcpy(d, s, 4) to the inline "mov"es.
 * icc8 compile memcpy(d, s, 4) to the inline "mov"es or XMM moves.
 */

probably. I imagine that was the intention, and the programmer probably profiled it in their environment, but whether it's always faster is hard to say. — Chris Card, Apr 13 '11 at 09:59
Simple copy with while contains no extra checks and do a byte-by-byte copy. It is shorter in terms of asm instructions. — osgx, Apr 13 '11 at 10:00

A curious string copy function in C

1 Answers1