77

Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.

public class A
{
    public string Name { get; private set; }
    public B B { get; private set; }
    public A()
    {
        B = new B(this);
        Name = "test";
    }
}

public class B
{
    public A A { get; private set; }
    public B(A a)
    {
        //a.Name is null
        A = a;
    }
}

What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?

user
  • 5,335
  • 7
  • 47
  • 63
Stilgar
  • 22,354
  • 14
  • 64
  • 101
  • 17
    Why do you expect it to be invalid? – leppie Oct 05 '11 at 12:51
  • 1
    Because my understanding is that a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object. – Stilgar Oct 05 '11 at 12:53
  • 1
    The code is valid, but not very reliable. Stilgar is right - instance of class `A` passed to `B`.ctor is not fully initilized. You have to create new instance of `B` in `A` after instance of `A` is fully initialized - it should be the last line in .ctor. – Karel Frajták Oct 05 '11 at 12:57

8 Answers8

108

Why is this even valid?

Why do you expect it to be invalid?

Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.

Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.

Are there any other ways to observe the state of an object that is not fully constructed?

Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:

  • Put "this" in a static field and reference it from another thread
  • make a method call or constructor call and pass "this" as an argument
  • make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.

I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!

Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.

Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.

If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.

Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.

In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.

For more information on how value type constructors work, see:

Debunking another myth about value types

And for more information on how C# language semantics try to save you from yourself, see:

Why Do Initializers Run In The Opposite Order As Constructors? Part One

Why Do Initializers Run In The Opposite Order As Constructors? Part Two

I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.


Now let's consider the root cause of your question: how do you make immutable objects that reference each other?

Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:

  • An immutable list of immutable nodes, each of which contains a value.
  • An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.

Now the way you make nodes A and B "reference" each other is:

A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);

And you're done, you've got a graph where A and B "reference" each other.

The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Thanks a lot. I've read the article about the value type and they were part of the reason why I thought the language tries to guarantee the full construction of the object before it can be observed. After all this is why the value is copied around. – Stilgar Oct 05 '11 at 15:19
  • 1
    @Stilgar: We try to be a "pit of quality" language, where you really have to work hard to write a program that does something crazy. Unfortunately, it is very difficult to design a useful language in which it is *guaranteed* that an object will *never* be observed in an inconsistent state, so we don't try to *guarantee* that. We just try to nudge you strongly in that direction. (This is basically why non-nullable reference types don't work in .NET; it is very hard to guarantee *in the type system* that a field of non-nullable reference type is *never* observed to be null.) – Eric Lippert Oct 05 '11 at 15:23
  • 3
    Yeah it seems like you did so good job that I was expecting that you'll prevent me from writing the above code. – Stilgar Oct 05 '11 at 15:28
  • @Stilgar: The problem is that if we do, then we also prevent you from writing a lot of useful code. It is sometimes very useful to be able to pass "this" to a method or constructor of another class, particularly in these sorts of initialization scenarios. I write code like that every day: in the compiler, we are often in situations where immutable "code analyzers" are constructing immutable "symbols", and they have to be able to mutually reference each other. – Eric Lippert Oct 05 '11 at 15:35
  • @EricLippert: It is in fact useful. I have cases where it was useful to pass a `this` reference to other objects from inside the constructor. But then why doesn't the compiler allow using the `this` keyword from field initializers? Both can see partially constructed objects. [This question](http://stackoverflow.com/questions/6125247) didn't really provide a rational for this limitation. If you happen to know the reason, please share. – Allon Guralnek Oct 11 '11 at 21:36
  • @AllonGuralnek: Two reasons. First, it is simply much more error-prone to access "this" from a field initializer. Second, by ensuring that "this" is never referenced in a field initializer we guarantee that a readonly field initialized with a field initializer is never observed to be in its uninitialized state. (OK, that is not quite true; if an exception is thrown by one of the field initializers then *the finalizer* sees the field in its uninitialized state.) – Eric Lippert Oct 11 '11 at 21:45
48

Yes, this is the only way for two immutable objects to refer to each other - at least one of them must see the other in a not-fully-constructed way.

It's generally a bad idea to let this escape from your constructor but in cases where you're confident of what both constructors do, and it's the only alternative to mutability, I don't think it's too bad.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 2
    I offered an example of mutual references using `this` in [this answer](http://stackoverflow.com/questions/4556393/how-can-i-instantiate-immutable-mutually-recursive-objects/4556681#4556681) to another question. – Brian Oct 05 '11 at 18:36
22

"Fully constructed" is defined by your code, not by the language.

This is a variation on calling a virtual method from the constructor,
the general guideline is: don't do that.

To correctly implement the notion of "fully constructed", don't pass this out of your constructor.

H H
  • 263,252
  • 30
  • 330
  • 514
8

Indeed, leaking the this reference out during the constructor will allow you to do this; it may cause problems if methods get invoked on the incomplete object, obviously. As for "other ways to observe the state of an object that is not fully constructed":

  • invoke a virtual method in a constructor; the subclass constructor will not have been called yet, so an override may try to access incomplete state (fields declared or initialized in the subclass, etc)
  • reflection, perhaps using FormatterServices.GetUninitializedObject (which creates an object without calling the constructor at all)
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
6

If you consider the initialization order

  • Derived static fields
  • Derived static constructor
  • Derived instance fields
  • Base static fields
  • Base static constructor
  • Base instance fields
  • Base instance constructor
  • Derived instance constructor

clearly through up-casting you can access the class BEFORE the derived instance constructor is called (this is the reason you shouldn't use virtual methods from constructors. They could easily access derived fields not initialized by the constructor/the constructor in the derived class could not have brought the derived class in a "consistent" state)

xanatos
  • 109,618
  • 12
  • 197
  • 280
4

You can avoid the problem by instancing B last in your constuctor:

 public A() 
    { 
        Name = "test"; 
        B = new B(this); 
    } 

If what you suggest was not possible, then A would not be immutable.

Edit: fixed, thanks to leppie.

Nick
  • 4,787
  • 2
  • 18
  • 24
  • You write to instantiate B last in the constructor, yet in the example you initiate it first, just like in the code from OP. Typo? – Avada Kedavra Oct 05 '11 at 12:59
  • 2
    I think the OP knows this and was asking a more fundamental question. – H H Oct 05 '11 at 13:08
  • @Nick : Wery good untill You have 3 immutable classes :) ` public A() { Name = "test"; B = new B(this); C = new C(this); }` – VMykyt Oct 12 '11 at 15:44
3

The principle is that don't let your this object escape from the constructor body.

Another way to observe such problem is by calling virtual methods inside the constructor.

Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94
1

As noted, the compiler has no means of knowing at what point an object has been constructed well enough to be useful; it therefore assumes that a programmer who passes this from a constructor will know whether an object has been constructed well enough to satisfy his needs.

I would add, however, that for objects which are intended to be truly immutable, one must avoid passing this to any code which will examine the state of a field before it has been assigned its final value. This implies that this not be passed to arbitrary outside code, but does not imply that there is anything wrong with having an object under construction pass itself to another object for the purpose of storing a back-reference which will not actually be used until after the first constructor has completed.

If one were designing a language to facilitate the construction and use of immutable objects, it may be helpful for it to declare methods as being usable only during construction, only after construction, or either; fields could be declared as being non-dereferenceable during construction and read-only afterward; parameters could likewise be tagged to indicate that should be non-dereferenceable. Under such a system, it would be possible for a compiler to allow the construction of data structures which referred to each other, but where no property could ever change after it was observed. As to whether the benefits of such static checking would outweigh the cost, I'm not sure, but it might be interesting.

Incidentally, a related feature which would be helpful would be the ability to declare parameters and function returns as ephemeral, returnable, or (the default) persistable. If a parameter or function return were declared ephemeral, it could not be copied to any field nor passed as a persistable parameter to any method. Additionally, passing an ephemeral or returnable value as a returnable parameter to a method would cause the return value of the function to inherit the restrictions of that value (if a function has two returnable parameters, its return value would inherit the more restrictive constraint from its parameters). A major weakness with Java and .net is that all object references are promiscuous; once outside code gets its hands on one, there's no telling who may end up with it. If parameters could be declared ephemeral, it would more often be possible for code which held the only reference to something to know it held the only reference, and thus avoid needless defensive copy operations. Additionally, things like closures could be recycled if the compiler could know that no references to them existed after they returned.

supercat
  • 77,689
  • 9
  • 166
  • 211