I'll go ahead and take a stab at answering this, though I'll be the first to admit that you can only go so far with an answer before you run into a wall that says "because someone made a decision and you're stuck with it forever."
The primary key to all of this comes in the form of the Mach-O Runtime specification for MacOS, which defines the .bss
section as being used for:
uninitialized static variables (for example, static int i;
).
You can read about it in this archived version from version 10.3, but you can also find the same information in other Mach-O references.
The important thing to note here is that the use of bss
refers to "private" symbols only. In other words, this refers to a C-style use of the static
keyword, which is guaranteed to be local to the translation unit.
When you declare a C++17 member variable as static inline
, despite the use of the perversely overloaded static
keyword, you've created a global object, of which there is guaranteed to only ever be one instance in a program. In other words, every translation unit compiled with this declaration will instantiate it, and the linker will be expected to "coalesce" them into a single instance by picking one of them. This is obviously quite different from the C-style "uninitialized static variable."
MacOS host compilers like clang implement this by declaring the symbol as weak
DATA
, similar for example to how default constructors would be declared (though those would of course be in TEXT
).
To illustrate this point, note that you could get the same effect without C++17 at all. For example compile these sets of examples this and look at the assembly output:
static uint8_t stuff[256000000]; // <- goes into .bss
int main() {
return (int)reinterpret_cast<uint64_t>(&stuff[0]);
}
Note that I'm having to do the &stuff
thing here to make sure the compiler doesn't optimize away stuff
entirely in this case.
Now try this:
uint8_t stuff[256000000]; // <-- goes into __DATA,__common
int main() {
return (int)reinterpret_cast<uint64_t>(&stuff[0]);
}
Getting closer. Note that stuff
is not put into .bss
like you might see on a linux platform. According again to the Mach-O runtime spec, the common
section is used for:
Uninitialized imported symbol definitions (for example, int i;
) located in the global scope (outside of a function declaration)."
Now try this:
__attribute__((weak)) uint8_t stuff[256000000]; // <-- in DATA,__data
int main() {
return (int)reinterpret_cast<uint64_t>(&stuff[0]);
}
This is exactly how a static inline
C++17 member variable will be defined. Deep under the hood, clang has assigned this symbol to be "coalesced" data, which on x86 just turns into standard DATA. If you really want to dive into the sausage factory, you can actually see that in the llvm SelectSectionForGlobal function.
if (GO->isWeakForLinker()) {
if (Kind.isReadOnly())
return ConstTextCoalSection;
if (Kind.isReadOnlyWithRel())
return ConstDataCoalSection;
return DataCoalSection;
}
And DataCoalSection
is correspondingly defined here to be identical to the ordinary data section on everything but power PC.
So from my perspective the behavior you're seeing is working as I would expect given the available specifications for the Mach-O runtime.