Consider a global (namespace scope) variable declared using the new inline
variable feature in C++ 17:
struct something {
something();
~something();
};
inline something global;
In Clang 14 on x86 the generated assembly to initialize the variable at startup is as follows:
__cxx_global_var_init: # @__cxx_global_var_init
push rbx
mov al, byte ptr [rip + guard variable for global]
test al, al
je .LBB0_1
.LBB0_4:
pop rbx
ret
.LBB0_1:
mov edi, offset guard variable for global
call __cxa_guard_acquire
test eax, eax
je .LBB0_4
mov edi, offset global
call something::something() [complete object constructor]
mov edi, offset something::~something() [complete object destructor]
mov esi, offset global
mov edx, offset __dso_handle
call __cxa_atexit
mov edi, offset guard variable for global
pop rbx
jmp __cxa_guard_release # TAILCALL
mov rbx, rax
mov edi, offset guard variable for global
call __cxa_guard_abort
mov rdi, rbx
call _Unwind_Resume@PLT
global:
.zero 1
guard variable for global:
.quad 0 # 0x0
This is a double-checked locking pattern which results in a thread-safe initialization process: the first test al, al
does an initial optimistic check to see if the variable has been initialized already, and – if that indicates the variable hasn't been initialized – a call to __cxa_guard_acquire
is made which will again check this same variable under a lock to avoid races where two or more threads both "pass" the initial check: only one will "pass" the second check.
This pattern is the same one that is used to initialize function-local static variables of non-trivial type (the standard requires those to be initialized lazily).
We may also look at the assembly for the "template static holder" pattern which was often used to implement global variables in headers before C++17, something like so:
struct something {
something();
~something();
};
template <typename T = void>
struct holder {
static something global;
};
template <typename T>
something holder<T>::global;
void instantiate() {
(void)holder<void>::global;
}
Here, the holder
class is there to allow the holder<T>::global
to be instantiated in multiple translation units and requires that this work ("let the linker sort it out"), unlike the same case for namespace scope globals or static variables in a non-template class. The instantiate()
call is there simply to actually instantiate the template and associated static member, since otherwise nothing would be produced at all.
The assembly is as follows:
instantiate(): # @instantiate()
ret
__cxx_global_var_init: # @__cxx_global_var_init
push rax
cmp byte ptr [rip + guard variable for holder<void>::global], 0
je .LBB1_1
pop rax
ret
.LBB1_1:
mov edi, offset holder<void>::global
call something::something() [complete object constructor]
mov edi, offset something::~something() [complete object destructor]
mov esi, offset holder<void>::global
mov edx, offset __dso_handle
call __cxa_atexit
mov byte ptr [rip + guard variable for holder<void>::global], 1
pop rax
ret
holder<void>::global:
.zero 1
guard variable for holder<void>::global:
.quad 0 # 0x0
The double-checked locking is gone: the guard variable is just checked once, outside of any lock.
Why the difference? Is it just an implementation quirk or does this flow somehow from a requirement in the standard?
It would seem that usually a lock is unnecessary for these global constructors, since these generated functions are usually call in single-threaded code at startup before main
is reached, or while dynamically loading a shared object. Perhaps there is some scenario I'm not thinking about, however, such as parallel loading of two shared objects both referring to the same global?