Upcast/Downcast and serialization

Question

Just playing around with casting. Assume, we have 2 classes

public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

Instantiate both of them

        Base b1 = new Base {a = 1};
        Inh i1 = new Inh {a = 2, b = 2};

Now, lets try upcast

        // Upcast
        Base b2 = i1;

Seems that b2 is still holding field b, which is presented only in Inh class. Lets check it by downcasting.

        // Downcast
        var b3 = b2;
        var i2 = b2 as Inh;
        var i3 = b3 as Inh;

        bool check = (i2 == i3);

Check is true here (i guess, because i2 and i3 are referencing to the same instance i1). Ok, lets see, how they would be stored in array.

        var list = new List<Base>();

        list.Add(new Base {a = 5});
        list.Add(new Inh {a = 10, b = 5});

        int sum = 0;
        foreach (var item in list)
        {
            sum += item.a;
        }

Everything is okay, as sum is 15. But when i'm trying to serialize array by using XmlSerializer (just to see, what's inside), it returns InvalidOperationException "The type ConsoleApplication1.Inh was not expected". Well, fair enough, because its array of Bases.

So, what actually b2 is? Can i serialize an array of Bases and Inhs? Can i get Inhs fields by downcasting items from deserialized array?

score 4 · Answer 1 · answered Sep 23 '13 at 09:27

4

If you want it to work with serialization, you'll need to tell the serializer about the inheritance. In the case of XmlSerializer, this is:

[XmlInclude(typeof(Inh))]
public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

Then the following works fine:

var list = new List<Base>();

list.Add(new Base { a = 5 });
list.Add(new Inh { a = 10, b = 5 });

var ser = new XmlSerializer(list.GetType());
var sb = new StringBuilder();
using (var xw = XmlWriter.Create(sb))
{
    ser.Serialize(xw, list);
}
string xml = sb.ToString();
Console.WriteLine(xml);
using (var xr = XmlReader.Create(new StringReader(xml)))
{
    var clone = (List<Base>)ser.Deserialize(xr);
}

with clone having the expected 2 objects of different types. The xml is (reformatted for readability):

<?xml version="1.0" encoding="utf-16"?><ArrayOfBase
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Base><a>5</a></Base>
    <Base xsi:type="Inh"><a>10</a><b>5</b></Base>
</ArrayOfBase>

answered Sep 23 '13 at 09:27

Marc Gravell

1,026,079
266
2,566
2,900

Then, another question (more detailed): how Inh would be stored in array of Bases? Would b be **:)** just dropped? – Vitalii Vasylenko Sep 23 '13 at 09:41
1

@VitaliiVasylenko I'm writing a long and detailed answer on that ... give me 1 minute – Marc Gravell Sep 23 '13 at 09:45
But wait.. when we are serializing, we are telling that Base class is typeof Inherited? Maybe vice verca? – Vitalii Vasylenko Sep 24 '13 at 20:27
1

@VitaliiVasylenko the only thing we told the serializer was `typeof(List)`, via `list.GetType()`. From `Base`, the serializer can *infer* that it also needs to know about `Inh` – Marc Gravell Sep 24 '13 at 20:59
Okaay.. so, if i'd have several inheritors, i'd be needed to add them all to the base class? Yeeep, fair enough, as we have list of Base class, so it should be warned about anchestors... Btw, do you know, if Newtonsoft's Json lib has the same or similar attributes? – Vitalii Vasylenko Sep 24 '13 at 21:12
Hmm.. so, for avoiding collapses, isn't it better to use List for list, when both kinds of classes can be found (both Base and Inh)? Then, i'd have List, and if it is Base, i would have "almost" Inh, but with less details? – Vitalii Vasylenko Sep 24 '13 at 23:11
@VitaliiVasylenko if you tell the serializer to expect a `List`, but actually give it a `List`, it will throw an exception. And you **cannot** put a `Base` in a `List` - so there is no solution here that involves a `List` (unless you no longer need to serialize any `Base` instances) – Marc Gravell Sep 25 '13 at 06:51
But i can downcast Base to Inh (by setting empty extra fields) and put it into List. Or its totally wrong way? The thing is that i have lots of items with different level of details (Base and Inh), and i should store them in 1 list, which should be serialized/deserialized. – Vitalii Vasylenko Sep 25 '13 at 11:04
1

@VitaliiVasylenko no, you cannot downcast a `Base` to an `Inh`. If the actual object (not the variable) is a `Base`, then it **is not** an `Inh`, and cannot be treated as one in any scenario. If you want to store heterogeneous items in the same list, that is fine in this case - you just use a `List`. An `Inh` is a `Base`, but a `Base` is not an `Inh` – Marc Gravell Sep 25 '13 at 13:11

score 2 · Accepted Answer · answered Sep 23 '13 at 09:52

Actually, the question is about what happens in memory

So; not serialization, then. K.

Let's take it from the top, then:

public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

Here we have two reference types (classes); the fact that they are reference-type is very important, because that directly influences what is actually stored in arrays / variables.

Base b1 = new Base {a = 1};
Inh i1 = new Inh {a = 2, b = 2};

Here we create 2 objects; one of type Base, and one of type Inh. The reference to each object is stored in b1 / i1 respectively. I've italicized the word reference for a reason: it is not the object that is there. The object is somewhere arbitrary on the managed heap. Essentially b1 and i1 are just holding the memory address to the actual object. Side note: there are minor technical differences between "reference", "address" and "pointer", but they serve the same purpose here.

Base b2 = i1;

This copies the reference, and assigns that reference to b2. Note that we haven't copied the object. We still only have 2 objects. All we have copied is the number that happens to represent a memory address.

var b3 = b2;
var i2 = b2 as Inh;
var i3 = b3 as Inh;
bool check = (i2 == i3);

Here we do the same thing in reverse.

var list = new List<Base>();

list.Add(new Base {a = 5});
list.Add(new Inh {a = 10, b = 5});

int sum = 0;
foreach (var item in list)
{
    sum += item.a;
}

The list here is a list of references. The objects are still somewhere arbitrary on the managed heap. So yes, we can iterate through them. Because all Inh are also Base, there is no issue whatsoever here. So finally, we get to the question (from comments(:

Then, another question (more detailed): how Inh would be stored in array of Bases? Would b be just dropped?

Absolutely not. Because they are reference-types, the list never actually contains and Inh or Base objects - it only contains the reference. The reference is just a number - 120934813940 for example. A memory address, basically. It doesn't matter at all whether we think 120934813940 points to a Base or an Inh - our talking about it in either terms doesn't impact the actual object located at 120934813940. All we need to do is perform a cast, which means: instead of thinking of 120934813940 as a Base, think of it as an Inh - which involves a type-test to confirm that it is what we suspect. For example:

int sum = 0;
foreach (var item in list)
{
    sum += item.a;
    if(item is Inh)
    {
       Inh inh = (Inh)item;
       Console.WriteLine(inh.b);
    }
}

So b was there all the time! The only reason we couldn't see it is that we only assumed that item was a Base. To get access to b we need to cast the value. There are three important operations commonly used here:

obj is Foo - performs a type test returning true if the value is non-null and is trivially assignable as that type, else false
obj as Foo - performs a type test, returning the reference typed as Foo if it is non-null and is a match, or null otherwise
(Foo)obj - performs a type test, returning null if it is null, the reference typed as Foo if it is a match, or throws an exception otherwise

So that loop could also be written as:

int sum = 0;
foreach (var item in list)
{
    sum += item.a;
    Inh inh = item as Inh;
    if(inh != null)
    {
       Console.WriteLine(inh.b);
    }
}

Thanks for deeply detailed answer, that makes things much more clear. So, when we are casting reference of Inh to Base, we are just saying "usually, it was Inh, but you can try to take object, which is stored at that address, as Base"? — Vitalii Vasylenko, Sep 23 '13 at 10:19
@VitaliiVasylenko casting `Inh` to `Base` is a no-op; the compiler and CLI already knows that any `Inh` is definitely a `Base`. All that does is throw away a little bit of information (the fact that we also know it to be an `Inh`) - limiting the available operations to those that are available on `Base`. — Marc Gravell, Sep 23 '13 at 10:29

vgru · Answer 3 · 2013-09-23T12:25:40.873

To clarify what actually happens when you cast from one type to another, it may be helpful to mention some information about how instances of reference types are stored in the CLR.

First of all, there are value types (structs).

they are stored on the stack (well, it may be an "implementation detail", but IMHO we can safely assume it's the way things are),
they don't support inheritance (no virtual methods),
instances of value types contain only the values of their fields.

This means all methods and properties in a struct are basically static methods with this struct reference being passed as a parameter implicitly (again, there are one or two exceptions, like ToString, but mostly irrelevant).

So, when you do this:

struct SomeStruct 
{
    public int Value;
    public void DoSomething()
    {
        Console.WriteLine(this.Value);
    }
}

SomeStruct c; // this is placed on stack
c.DoSomething();

It will be logically the same as having a static method and passing the reference to the SomeStruct instance (the reference part is important because it allows the method to mutate the struct contents by writing to that stack memory area directly, without the need to box it):

struct SomeStruct 
{
    public int Value;
    public static void DoSomething(ref SomeStruct instance)
    {
        Console.WriteLine(instance.Value);
    }
}

SomeStruct c; // this is placed on stack
SomeStruct.DoSomething(ref c); // this passes a pointer to the stack and jumps to the method call

If you called DoSomething on a struct, there doesn't exist a different (overriden) method which may have to be invoked, and the compiler knows the actual function statically.

Reference types (classes) are a bit more complex.

instances of reference types are stored on the heap, and all variables or fields of a certain reference type merely hold a reference to the object on the heap. Assigning a value of a variable to another, as well as casting, simply copies the reference around, leaving the instance unchanged.
they support inheritance (virtual methods)
instances of reference types contain values of their fields, and some additional luggage related to GC, Synchronization, AppDomain identity and Type.

If a class method is non-virtual, then it basically behaves like a struct method: it's known at compile time and it's not going to change, so compiler can emit a direct function call passing the object reference just like it did with a struct.

So, what happens when you cast to a different type? As far as the memory layout is concerned, nothing much.

If you have your object defined like you mentioned:

public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

And you instantiate an Inh, and then cast it to a Base:

Inh i1 = new Inh() { a = 2, b = 5 };
Base b2 = i1;

The heap memory will contain a single object instance (at, say, address 0x20000000):

// simplified memory layout of an `Inh` instance
[0x20000000]: Some synchronization stuff
[0x20000004]: Pointer to RTTI (runtime type info) for Inh
[0x20000008]: Int32 field (a = 2)
[0x2000000C]: Int32 field (b = 5)

Now, all variables of a reference type point to the location of the RTTI pointer (the actual object's memory area starts 4 bytes earlier, but that's not so important).

Both i1 and b2 contain a single pointer (0x20000004 in this example), and the only difference is that compiler will allow a Base variable to reference only the first field in that memory area (the a field), with no way to go further through the instance.

For the Inh instance i1, that same field is located at exactly the same offset, but it also has access to the next field b located 4 bytes after the first one (at 8 byte offset from the RTTI pointer).

So if you write this:

Console.WriteLine(i1.a);
Console.WriteLine(b2.a);

Compiled code will in both cases be the same (simplified, no type checks, just addressing):

For i1:

a. Get the address of i1 (0x20000004)

b. Add offset of 4 bytes to get the address of a (0x20000008)

c. Fetch the value at that address (2)
For b2:

a. Get the address of b2 (0x20000004)

b. Add offset of 4 bytes to get the address of a (0x20000008)

c. Fetch the value at that address (2)

So, the one and only instance of Inh is in memory, unmodified, and by doing a cast you are simply telling the compiler how to represent the data found at that memory location. Compared with plain C, C# will fail at runtime if you try to cast to an object which is not in the inheritance hierarchy, but a plain C program would happily return whatever is at the known fixed offset of a certain field in your instance. The only difference is that C# checks if what you are doing makes sense, but the type of the variable otherwise serves only to allow walking around the same object instance.

You can even cast it to an Object:

Object o1 = i1; // <-- this still points to `0x20000004`    
// Hm. Ok, that worked, but now what?

Again, the memory instance is unmodified, but there is nothing much you can do with a variable of Object, except downcast it again.

Virtual methods are even more interesting, because they involve the compiler jumping through the mentioned RTTI pointer to get to the virtual method table for that type (allowing a type to override methods of a base type). This again means that the compiler will simply use the fixed offset for a particular method, but the actual instance of the derived type will have the appropriate method implementation at that location in the table.

Just a fast question: so can we cay that static class = struct? — Vitalii Vasylenko, Oct 03 '13 at 11:02
@VitaliiVasylenko: no, a class is a always a reference type (instantiated on heap, and a class variable always holds a reference, or a "pointer" to that memory), while struct is always a value type (instantiated on stack, and passing the struct around always copies its entire contents). Using structs at all is rarely recommended in .NET, except for tiny structs (like `System.Drawing.Point` for example), and even then it's preferable to make them immutable (with `readonly` backing fields). — vgru, Oct 04 '13 at 09:46

score 0 · Answer 4 · answered Sep 23 '13 at 09:30

0

b2 is an Inh, but to the compiler it is a Base because you declared it as such.

Still, if you do (b2 as Inh).b = 2, it will work. The compiler then knows to treat it as an Inh and the CLR knows it's really an Inh already.

As Marc pointed out, if you use XML Serialization you will need to decorate the base class with a declaration per inheriting type.

answered Sep 23 '13 at 09:30

Roy Dictus

32,551
8
60
76

The issue in the question seems to be surrounding *serialization* - not just what happens in memory. – Marc Gravell Sep 23 '13 at 09:32
@MarcGravell thanks for fulfilling answer. Actually, the question is about what happens in memory - xmlserialization is just a way to see, what's inside of array. Anyway, thanks for providing extra info, which is also useful. – Vitalii Vasylenko Sep 23 '13 at 09:36
1

@VitaliiVasylenko well, the xml serialization is the only time in the question you describe an actual issue. But no, xml serialization is **not** a way to see what's inside of the array. That is **completely unrelated**. – Marc Gravell Sep 23 '13 at 09:39

Upcast/Downcast and serialization

4 Answers4