Why does clang do thread-safe init for some globals, but not others?

Question

Consider a global (namespace scope) variable declared using the new inline variable feature in C++ 17:

struct something {
    something();
    ~something();
};

inline something global;

In Clang 14 on x86 the generated assembly to initialize the variable at startup is as follows:

__cxx_global_var_init:                  # @__cxx_global_var_init
        push    rbx
        mov     al, byte ptr [rip + guard variable for global]
        test    al, al
        je      .LBB0_1
.LBB0_4:
        pop     rbx
        ret
.LBB0_1:
        mov     edi, offset guard variable for global
        call    __cxa_guard_acquire
        test    eax, eax
        je      .LBB0_4
        mov     edi, offset global
        call    something::something() [complete object constructor]
        mov     edi, offset something::~something() [complete object destructor]
        mov     esi, offset global
        mov     edx, offset __dso_handle
        call    __cxa_atexit
        mov     edi, offset guard variable for global
        pop     rbx
        jmp     __cxa_guard_release             # TAILCALL
        mov     rbx, rax
        mov     edi, offset guard variable for global
        call    __cxa_guard_abort
        mov     rdi, rbx
        call    _Unwind_Resume@PLT
global:
        .zero   1

guard variable for global:
        .quad   0                               # 0x0

This is a double-checked locking pattern which results in a thread-safe initialization process: the first test al, al does an initial optimistic check to see if the variable has been initialized already, and – if that indicates the variable hasn't been initialized – a call to __cxa_guard_acquire is made which will again check this same variable under a lock to avoid races where two or more threads both "pass" the initial check: only one will "pass" the second check.

This pattern is the same one that is used to initialize function-local static variables of non-trivial type (the standard requires those to be initialized lazily).

We may also look at the assembly for the "template static holder" pattern which was often used to implement global variables in headers before C++17, something like so:

struct something {
    something();
    ~something();
};

template <typename T = void>
struct holder {
    static something global;
};

template <typename T>
something holder<T>::global;


void instantiate() {
    (void)holder<void>::global;
}

Here, the holder class is there to allow the holder<T>::global to be instantiated in multiple translation units and requires that this work ("let the linker sort it out"), unlike the same case for namespace scope globals or static variables in a non-template class. The instantiate() call is there simply to actually instantiate the template and associated static member, since otherwise nothing would be produced at all.

The assembly is as follows:

instantiate():                       # @instantiate()
        ret
__cxx_global_var_init:                  # @__cxx_global_var_init
        push    rax
        cmp     byte ptr [rip + guard variable for holder<void>::global], 0
        je      .LBB1_1
        pop     rax
        ret
.LBB1_1:
        mov     edi, offset holder<void>::global
        call    something::something() [complete object constructor]
        mov     edi, offset something::~something() [complete object destructor]
        mov     esi, offset holder<void>::global
        mov     edx, offset __dso_handle
        call    __cxa_atexit
        mov     byte ptr [rip + guard variable for holder<void>::global], 1
        pop     rax
        ret
holder<void>::global:
        .zero   1

guard variable for holder<void>::global:
        .quad   0                               # 0x0

The double-checked locking is gone: the guard variable is just checked once, outside of any lock.

Why the difference? Is it just an implementation quirk or does this flow somehow from a requirement in the standard?

It would seem that usually a lock is unnecessary for these global constructors, since these generated functions are usually call in single-threaded code at startup before main is reached, or while dynamically loading a shared object. Perhaps there is some scenario I'm not thinking about, however, such as parallel loading of two shared objects both referring to the same global?

I don't believe there would be any parallel loading because the compiler wouldn't generate automatic multithreading code on user functions (apart from synchronization there may be other issues). - why don't you put a breakpoint to the function and check the thread id. — Michael Chourdakis, Jul 14 '22 at 20:42
@MichaelChourdakis - I don't think putting a breakpoint and chekcing the ID can help me answer this? I'm quite sure the ID will be the ID of the loading thread: the _main_ for constructors that run at startup and the thread that triggered the shard object loading for shared objects loaded dynamically later after startup. — BeeOnRope, Jul 24 '22 at 20:15

Why does clang do thread-safe init for some globals, but not others?

0 Answers0