Are zero initializers faster than memset?

Question

I maintain legacy C code where at many places they have small arrays like int a[32]; followed by a memset(a, 0, sizeof a); to zero initialize it.

I'm thinking of refactoring this into int a[32] = {0}; and removing the memset.

The question is: Are using zero initializers result in faster code in general than calling memset?

It depends. The compiler *might* do the zeroing at compile-time for *both* the `memset` and the initialization. Or it might do the initialization at runtime using `memset`. You simply have to build (with optimization) and look at the generated code. — Some programmer dude, Nov 24 '16 at 12:27
What did your profiler say? More important is they can be different for types other than integers and `memset` will probaly not do what the programmer expected on certain platforms. — too honest for this site, Nov 24 '16 at 12:27
As long as it's not slower, I'd really recommend such a change. In other words, even if the generated code still calls `memset()`, your solution is better since it's more high-level. Also, very impressed by legacy code that uses `sizeof a` without parentheses! — unwind, Nov 24 '16 at 12:40
@unwind, this is not "legacy", but good practice to emphasize that the size is taken of an object and not a type. — Jens Gustedt, Nov 24 '16 at 12:44
@JensGustedt That was my point, and why I was impressed. 9/10 (or more!) posts containing `sizeof` here always put the argument in parentheses. — unwind, Nov 24 '16 at 12:45
@user694733 Well this code is compiled to Windows, BSD, HP-UX, Linux, and onto architectures like x86, x64, ARM and MIPS. That's why I asked the question 'in general'. — Calmarius, Nov 24 '16 at 12:55
Maybe related to http://stackoverflow.com/questions/40739415/avoiding-memset-for-a-multi-type-structure — alpereira7, Nov 24 '16 at 12:58
In modern optimizing compilers, `memset` is an intrinsic and the compiler will know about `memset(somethin0, 0, something1);`. Similarly it will know about `{0}` and what it does for an array of integers. With this in mind, it should be able to choose the best code regardless of which of the two versions you use. I'd use `{0}` for the reasons that unwind gives. — Petr Skocik, Nov 24 '16 at 13:07

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

TL;DR: Use the initializer - it's never worse than `memset()`.

It depends on your compiler. It shouldn't be any slower than calling memset() (because calling memset() is one option available to the compiler).

The initializer is easier to read than imperatively overwriting the array; it also adapts well if the element type is changed to something where all-bit-zero isn't what you want.

As an experiment, let's see what GCC does with this:

#include <string.h>

int f1()
{
    int a[32] = {0};
    return a[31];
}

int f2()
{
    int a[32];
    memset(a, 0, sizeof a);
    return a[31];
}

Compiling with gcc -S -std=c11 gives:

f1:
.LFB0:
    .file 1 "40786375.c"
    .loc 1 4 0
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $8, %rsp
    .loc 1 5 0
    leaq    -128(%rbp), %rdx
    movl    $0, %eax
    movl    $16, %ecx
    movq    %rdx, %rdi
    rep stosq
    .loc 1 6 0
    movl    -4(%rbp), %eax
    .loc 1 7 0
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
f2:
.LFB1:
    .loc 1 10 0
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    addq    $-128, %rsp
    .loc 1 12 0
    leaq    -128(%rbp), %rax
    movl    $128, %edx
    movl    $0, %esi
    movq    %rax, %rdi
    call    memset@PLT
    .loc 1 13 0
    movl    -4(%rbp), %eax
    .loc 1 14 0
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

showing that f1() uses rep stosq for the initializer, whereas f2() has the function call, exactly like the C code. It's quite likely that memset() has a more efficient vectorized implementation for large arrays, but for small arrays like this, any benefits would likely be outweighed by the function call overhead.

If we declare a as volatile, we get to see what happens with optimizations enabled (gcc -S -std=c11 -O3):

f1:
.LFB4:
    .cfi_startproc
    subq    $16, %rsp
    .cfi_def_cfa_offset 24
    xorl    %eax, %eax
    movl    $16, %ecx
    leaq    -120(%rsp), %rdi
    rep stosq
    movl    4(%rsp), %eax
    addq    $16, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
f2:
.LFB5:
    .cfi_startproc
    subq    $16, %rsp
    .cfi_def_cfa_offset 24
    xorl    %eax, %eax
    movl    $16, %ecx
    leaq    -120(%rsp), %rdx
    movq    %rdx, %rdi
    rep stosq
    movl    4(%rsp), %eax
    addq    $16, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

You can see that the two functions now compile to identical code.

memset() is an intrinsic in most any compiler. It is in GCC. So that doesn't prove anything. — Hans Passant, Nov 24 '16 at 12:52
@Hans, my edit crossed with your comment. I hope my answer is clearer now. Without optimization, the initializer version is inlined; with optimizations, the two examples produce identical machine code. — Toby Speight, Nov 24 '16 at 12:59

Are zero initializers faster than memset?

1 Answers1

TL;DR: Use the initializer - it's never worse than memset().

TL;DR: Use the initializer - it's never worse than `memset()`.