1

UPDATE: I am not asking people to try this out & see if the code works for them. I am asking whether the code pattern is legal C++, regardless of whether it works for you.

I am investigating what I believe is a bug in the IAR C++ compiler for Renesas RX CPUs. Examples of the following code pattern sometimes work on my small embedded system, other times it crashes during initialization of parentRefToChildInstance in a jump to address 0x00000000 (or nearby; I've seen a jump to 0x00000038 as well). For a given version of the source code the behavior seems to be consistent between compilations, but if the code is perturbed in seemingly irrelevant ways, sometimes the behavior switches.

Is it legal to have pure-virtual-parent-class references to statically-allocated child class objects, or is it not legal because the order of initialization of statically allocated objects cannot be guaranteed?

char aGlobalVar = 0;

struct parent
{
   virtual ~parent() {}
   virtual void method1() = 0;
};

struct child : public parent
{
   child(int someValue) : m_someData(someValue) {}
   virtual ~child() {}
   virtual void method1() { ++aGlobalVar; }
   int m_someData;
};

child childInstance(0x1234abcd);
parent &parentRefToChildInstance = childInstance;

In cases where the crash occurs, the child class object has not been constructed at the point that the parent-class reference is initialized; I'm suspecting the compiler is somehow using the child object's vtable pointer to initialize the parent-class reference, though I haven't confirmed that for certain. But I thought the compiler should be able to initialize a reference knowing only the type of object it's referencing, and its address, both of which should be known at compile-time and link-time, respectively. If that's true, then it seems like it should not matter which order childInstance and parentRefToChildInstance are initialized.

Also, we're still limited to C++03, if that matters.

Here's a main() to go along with the above code...

int main()
{
   printf("aGlobalVar = %u\n", aGlobalVar);
   childInstance.method1();
   printf("aGlobalVar = %u\n", aGlobalVar);
   parentRefToChildInstance.method1();
   printf("aGlobalVar = %u\n", aGlobalVar);
}

Normally I would expect it to print this, not crash during static object initialization (even before main() starts):

aGlobalVar = 0
aGlobalVar = 1
aGlobalVar = 2
phonetagger
  • 7,701
  • 3
  • 31
  • 55
  • Can you provide a small main implementation which reproduces? – Kenny Ostrom Aug 23 '18 at 23:34
  • Could the child have been destructed, leaving a dangling reference? – Kenny Ostrom Aug 23 '18 at 23:42
  • Are the two static variables in the same file? I recall from years back there was some non-deterministic issues with static initialization if the static were in separate libraries(.so). Maybe something similar here. – Matthew Fisher Aug 23 '18 at 23:47
  • In the code pattern that others at my company have been using, the `childInstance` is a file-static object, and the `parentRefToChildInstance` is defined as a static member of a different class declared in a header file and defined in the same file as `childInstance`. And I know what you're talking about, you can't depend on the order of static object initialization. But in this case, I ***think*** it should be legal to initialize a reference to refer to an object that hasn't yet been constructed. So I don't think the order of initialization should matter. – phonetagger Aug 23 '18 at 23:52
  • @KennyOstrom Since the `childInstance` is statically allocated (some might call it a "global object"), it won't be destructed until program exit. And the crash happens before `main()` even runs, so dangling references can't be the issue. – phonetagger Aug 23 '18 at 23:55
  • It should work all the time. Works on my archlinux gcc8.1.1 -std=c++03 every time. Maybe unrelated heisenbug like buffer overflow. – KamilCuk Aug 23 '18 at 23:56
  • @KamilCuk No, not buffer overflow. I can guarantee that. If I step through the initialization code, right before the crash, the debugger tells me it's on the line where `parentRefToChildInstance` is initialized. If I look at the assembly & CPU registers, it tries to jump to 0x00000000 at a point only a few instructions into its initialization. – phonetagger Aug 23 '18 at 23:59
  • Can you post the generated assembly by the compiler that is causing the crash? What compiler do you use? What switches? – KamilCuk Aug 23 '18 at 23:59
  • @KamilCuk We're using IAR for the Renesas RX CPU, compiler version 3.10. The same issue also occurs with version 4.10, but we can't switch anyway due to other issues. – phonetagger Aug 24 '18 at 00:01
  • 2
    The code in your question is undoubtedly legal. But it's different from the code you are using, in possibly important ways. Moving things around into different files does matter. – Ben Voigt Aug 24 '18 at 00:20
  • 1
    My crystal ball says that `parentRefToChildInstance` is being accessed during static initialization in another compilation unit, and that compilation unit is running code before either `childInstance` or `parentRefToChildInstance` have undergone dynamic initialization. – Ben Voigt Aug 24 '18 at 00:23
  • It would be interesting to see assembly/disassembly for the source containing the definition of `parentRefToChildInstance`. – aschepler Aug 24 '18 at 00:23
  • @aschepler Since it's done during static object initialization, it doesn't seem to make a lot of intuitive sense. There's no source code to compare the assembly code against, at least not easily. – phonetagger Aug 24 '18 at 00:25
  • @BenVoigt Your crystal ball gives very good suggestions, but I don't think that's the case. If I set a breakpoint on the line where `parentRrefToChildInstance` is defined, the debugger breaks (during static object initialization, before `main()` starts) just a few instructions before the crash. – phonetagger Aug 24 '18 at 00:27
  • @phonetagger: I've never seen a debugger that can put a breakpoint on a static variable definition... I don't think you can reliably assume you are actually at the point where the reference should be initialized, as opposed to some access of it. – Ben Voigt Aug 24 '18 at 00:30
  • 1
    Recommendation: Change `parentRefToChildInstance` from a static data member to a function returning a reference. Then there will be no "initialization" of it that can happen at the wrong time (although it can still be used before `childInstance` is initialized). You can fix that one by moving `childInstance` into the function as a static local object. – Ben Voigt Aug 24 '18 at 00:34
  • @BenVoigt Funny you should suggest that. We did that, and it does solve the problem, presumably since the initialization of the reference occurs after the child object's constructor runs. I could be totally wrong on the vtable pointer thing, but regardless, changing to a function returning a reference does fix the crash. But the fact that it crashes at all is troubling. We are also seeing other strange problems with static initialization of references and pointers. Sometimes statically allocated pointers which clearly should point to something get statically initialized to NULL. – phonetagger Aug 24 '18 at 00:39
  • @phonetagger: Of course pointers get statically initialized to NULL. Initialization with an actual object address occurs during the dynamic initialization phase. See https://stackoverflow.com/q/35721031/103167 – Ben Voigt Aug 24 '18 at 00:46
  • 1
    @phonetagger: I guarantee that your problem is caused by using these variables during the dynamic initialization phase. The initialization of the reference isn't failing, it just hasn't been done yet by the time something else uses the reference. So you have undefined behavior. What the debugger shows you during undefined behavior is unfortunately not trustworthy. – Ben Voigt Aug 24 '18 at 00:49
  • @BenVoigt You all use some site to post code outside of SO, where can I show you the demo of the other code that fails? Then you will believe me. – phonetagger Aug 24 '18 at 00:50
  • @phonetagger: I do believe you. I think you accurately described what happened in the debugger, I just think you drew the wrong conclusion from it. – Ben Voigt Aug 24 '18 at 00:55
  • @BenVoigt See: http://rextester.com/FRUJK67421 That shows a different bug, same compiler. On my RX system, if I step through the resulting assembly of the function, after checking the internal bool that tracks whether the static variables have been initialized, the code first initializes the pointer to the correct address. Then the assembly code block-copies the other statically initialized data from ROM to its RAM location, overwriting the already-initialized pointer with NULL. Now that's not user error. – phonetagger Aug 24 '18 at 00:59
  • @phonetagger: Well that's unfortunate. It sounds to me like a mismatch between compiler and linker, both think it's the other one's job. To wit, since the linker is responsible for laying out static objects in memory, the compiler expects the linker to write the address it chose into the ROM block, so the block copy will work. And the linker writes NULL in the ROM block, thinking the compiler will generate code to overwrite it. – Ben Voigt Aug 24 '18 at 01:05
  • @BenVoigt Our app's data section is relocatable (we simultaneously run 4 independent apps on bare metal with a context switcher), so all pointers to data have to be initialized at runtime. On desktops/servers, that would be done by the application loader, which replaces "relocs" (a.k.a. “fixups”) in the executable image with the data’s addresses. On bare metal there's no "loader", the compiler emits code to patch up those holes that in a desktop .elf or .exe would be called "relocs". In my case it did so, but was thwarted immediately afterward by the wrong-order block copy init bug. – phonetagger Aug 24 '18 at 13:59

1 Answers1

2

The code shown is legal.

It's true that the order of initialization of objects and references defined in namespace scope or as static class members is unpredictable when the definitions are in different translation units, and this can often lead to nasty problems.

But initializing a reference doesn't actually require the bound object to be initialized, unless virtual inheritance gets involved.

C++17 [basic.life] paragraph 7 says:

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated..., any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a glvalue refers to allocated storage, and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:

  • the glvalue is used to access the object, or

  • the glvalue is used to call a non-static member function of the object, or

  • the glvalue is bound to a reference to a virtual base class, or

  • the glvalue is used as the operand of a dynamic_cast or as the operand of typeid.

None of those four things are happening during the initialization of parentRefToChildInstance, in particular because parent is not a virtual base class of child. So the code falls into the case mentioned in the quoted requirement as being well-defined.

Community
  • 1
  • 1
aschepler
  • 70,891
  • 9
  • 107
  • 161