6

I'm trying to determine how to address this use case using protobuf-net (Marc Gravell's implementation).

  • We have class A, which is considered version 1
  • An instance of class A has been serialized to disk
  • We now have class B, which is considered version 2 of class A (there were so many things wrong with class A, we had to create class B for the next version). Class A still exists in code, but only for legacy purposes.
  • I want to deserialize the version:1 data (stored to disk) as a class B instance, and use a logic routine to translate the data from the previous class A instance to a new instance of class B.
  • The instance of class B will be serialized to disk during operation.
  • The application should expect to deserialize instances of both class A and B.

The concept of data contract surrogates and the DataContractSerializer come to mind. The goal is transition the version:1 data to the new class B structure.

An example:

[DataContract]
public class A {

     public A(){}

     [DataMember]
     public bool IsActive {get;set;]

     [DataMember]
     public int VersionNumber {
          get { return 1; }
          set { }
     }

     [DataMember]
     public int TimeInSeconds {get;set;}

     [DataMember]
     public string Name {get;set;}

     [DataMember]
     public CustomObject CustomObj {get;set;} //Also a DataContract

     [DataMember]
     public List<ComplexThing> ComplexThings {get;set;} //Also a DataContract
     ...
}

[DataContract]
public class B {

     public B(A a) {
          this.Enabled = a.IsActive; //Property now has a different name
          this.TimeInMilliseconds = a.TimeInSeconds * 1000; //Property requires math for correctness
          this.Name = a.Name;
          this.CustomObject2 = new CustomObject2(a.CustomObj); //Reference objects change, too
          this.ComplexThings = new List<ComplexThings>();
          this.ComplexThings.AddRange(a.ComplexThings);
          ...
     }

     public B(){}

     [DataMember]
     public bool Enabled {get;set;]

     [DataMember]
     public int Version {
          get { return 2; }
          set { }
     }

     [DataMember]
     public double TimeInMilliseconds {get;set;}

     [DataMember]
     public string Name {get;set;}

     [DataMember]
     public CustomObject2 CustomObject {get;set;} //Also a DataContract

     [DataMember]
     public List<ComplexThing> ComplexThings {get;set;} //Also a DataContract
     ...
}

Class A was the first iteration of our object, and is actively in use. Data exists in v1 format, using class A for serialization.

After realizing the error of our ways, we create a new structure called class B. There are so many changes between A and B that we feel it's better to create B, as opposed to adapting the original class A.

But our application already exists and class A is being used to serialize data. We're ready to roll our changes out to the world, but we must first deserialize data created under version 1 (using class A) and instantiate it as class B. The data is significant enough that we can't just assume defaults in class B for missing data, but rather we must transition the data from a class A instance to class B. Once we have a class B instance, the application will serialize that data again in class B format (version 2).

We're assuming we'll make modifications to class B in the future, and we want to be able to iterate to a version 3, perhaps in a new class "C". We have two goals: address data already in existence, and prepare our objects for future migration.

The existing "transition" attributes (OnSerializing/OnSerialized,OnDeserializing/OnDeserialized,etc.) don't provide access to the previous data.

What is the expected practice when using protobuf-net in this scenario?

jro
  • 7,372
  • 2
  • 23
  • 36
  • do you know in advance whether you are reading v1 vs v2 data? how different are v1 & v2? on your 4th bullet - why would you read v1 data into a B, rather than read it into an A then convert the A to a B? – Marc Gravell Jan 24 '13 at 22:59
  • I won't know it. To expand upon this, I have to assume that I may encounter V1, V2, V3, V4, etc. And the difference could be substantial -- think replacement of entire sets of classes. – jro Jan 24 '13 at 23:11
  • "why would you read v1 data into a B, rather than read it into an A then convert the A to a B" - definitely want to do this, but I won't know if it's A or B data ahead of time. – jro Jan 24 '13 at 23:49
  • does the v1 da already exist? I can think of ways to do this if we still have the ability to control a master root node (above the A). I could still really do with a basic example that illustrates A and B – Marc Gravell Jan 25 '13 at 07:30
  • Yes, v1 data already exists. Editing above to include examples for A and B. – jro Jan 25 '13 at 13:23

2 Answers2

5

Right; looking at them you have indeed completely changed the contract. I know of no contract-based serializer that is going to love you for that, and protobuf-net is no different. If you already had a root node, you could do something like (in pseudo-code):

[Contract]
class Wrapper {
    [Member] public A A {get;set;}
    [Member] public B B {get;set;}
    [Member] public C C {get;set;}
}

and just pick whichever of A/B/C is non-null, perhaps adding some conversion operators between them. However, if you just have a naked A in the old data, this gets hard. There are two approaches I can think of:

  • add lots of shim properties for compatibility; not very maintainable, and I don't recommend it
  • sniff the Version as an initial step, and tell the serializer what to expect.

For example, you could do:

int version = -1;
using(var reader = new ProtoReader(inputStream)) {
    while(reader.ReadFieldHeader() > 0) {
        const int VERSION_FIELD_NUMBER = /* todo */;
        if(reader.FieldNumber == VERSION_FIELD_NUMBER) {
            version = reader.ReadInt32();
            // optional short-circuit; we're not expecting 2 Version numbers
            break;
        } else {
            reader.SkipField();
        }
    }
}
inputStream.Position = 0; // rewind before deserializing

Now you can use the serializer, telling it what version it was serialized as; either via the generic Serializer.Deserialize<T> API, or via a Type instance from the two non-generic APIs (Serializer.NonGeneric.Deserialize or RuntimeTypeModel.Default.Deserialize - either way, you get to the same place; it is really a case of whether generic or non-generic is most convenient).

Then you would need some conversion code between A / B / C - either via your own custom operators / methods, or by something like auto-mapper.

If you don't want any ProtoReader code kicking around, you could also do:

[DataContract]
class VersionStub {
    [DataMember(Order=VERSION_FIELD_NUMBER)]
    public int Version {get;set;}
}

and run Deserialize<VersionStub>, which will give you access to the Version, which you can then use to do the type-specific deserialize; the main difference here is that the ProtoReader code allows you to short-circuit as soon as you have a version-number.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Yeah this where I've been headed -- get a standard around version, interrogate it, then send off to some basic method for conversion. In case it's not yet obvious, these aren't my requirements (yay, inherited code.) – jro Jan 29 '13 at 00:51
2

I don't have an expected practice, but this is what I'd do.

Given you still have access to your V1 class add a property on your V1 class that provides a V2 instance.

In your ProtoAfterDeserialization of V1 create an instance of V2 and seeing it's a Migration I'd suggest manually transfer across what you need (or if it's not too hard, try Merge YMMV).

Also in your ProtoBeforeSerialization throw some form of exception so that you don't attempt to write out the old one any more.

Edit: Examples of using these (VB code)

<ProtoBeforeSerialization()>
Private Sub BeforeSerialisaton()

End Sub

<ProtoAfterSerialization()>
Private Sub AfterSerialisaton()

End Sub

<ProtoBeforeDeserialization()>
Private Sub BeforeDeserialisation()

End Sub

<ProtoAfterDeserialization()>
Private Sub AfterDeserialisation()

End Sub

after seeing your example I hope this satisfied what you are trying to do. Class1 is how you load/convert.

using ProtoBuf;
using System.Collections.Generic;
using System.IO;

public class Class1
{
    public Class1()
    {
        using (FileStream fs = new FileStream("c:\\formatADataFile.dat",
               FileMode.Open, FileAccess.Read))
        {
            A oldA = Serializer.Deserialize<A>(fs);
            B newB = oldA.ConvertedToB;
        }
    }
}


[ProtoContract()]
public class B
{

    public B(A a)
    {
        //Property now has a different name
        this.Enabled = a.IsActive; 
        //Property requires math for correctness
        this.TimeInMilliseconds = a.TimeInSeconds * 1000; 
        this.Name = a.Name;
        //Reference objects change, too
        this.CustomObject2 = new CustomObject2(a.CustomObj); 
        this.ComplexThings = new List<ComplexThings>();
        this.ComplexThings.AddRange(a.ComplexThings);
        //...
    }

    public B() { }

    //[DataMember]
    [ProtoMember(1)]
    public bool Enabled { get; set; }

    //[DataMember]
    public int Version
    {
        get { return 2; }
        private set { }
    }

    [ProtoMember(2)]
    public double TimeInMilliseconds { get; set; }

    [ProtoMember(3)]
    public string Name { get; set; }

    [ProtoMember(4)]
    public CustomObject2 CustomObject { get; set; } //Also a DataContract

    [ProtoMember(5)]
    public List<ComplexThing> ComplexThings { get; set; } //Also a DataContract

    ///...
}

[ProtoContract()]
public class CustomObject2
{
    public CustomObject2()
    {
        Something = string.Empty;
    }

    [ProtoMember(1)]
    public string Something { get; set; }
}


[ProtoContract()]
public class A
{

    public A()
    {
        mBConvert = new B();
    }

    [ProtoMember(1)]
    public bool IsActive { get; set; }

    [ProtoMember(2)]
    public int VersionNumber
    {
        get { return 1; }
        private set { }
    }

    [ProtoBeforeSerialization()]
    private void NoMoreSavesForA()
    {
        throw new System.InvalidOperationException("Do Not Save A");
    }

    private B mBConvert;

    [ProtoAfterDeserialization()]
    private void TranslateToB()
    {
        mBConvert = new B(this);
    }

    public B ConvertedToB
    {
        get
        {
            return mBConvert;
        }
    }



    [ProtoMember(3)]
    public int TimeInSeconds { get; set; }

    [ProtoMember(4)]
    public string Name { get; set; }

    [ProtoMember(5)]
    public CustomObject CustomObj { get; set; } //Also a DataContract

    [ProtoMember(6)]
    public List<ComplexThing> ComplexThings { get; set; } //Also a DataContract
    //...
}

[ProtoContract()]
public class CustomObject
{
    public CustomObject()
    {

    }
    [ProtoMember(1)]
    public int Something { get; set; }
}

[ProtoContract()]
public class ComplexThing
{
    public ComplexThing()
    {

    }
    [ProtoMember(1)]
    public int SomeOtherThing { get; set; }
}
Paul Farry
  • 4,730
  • 2
  • 35
  • 61
  • Thanks Paul. Can you provide some code around the onDeserialized call? I'm not seeing where I could manually transfer data across (the source objects, I mean.) – jro Jan 24 '13 at 00:53
  • Ok, maybe I wasn't being clear. I have data that is serialized in V1, but I want to operate in V2 mode. The data will be serialized *again*, going forward in V2 format. So while this approach would work for data that's in V1 format, it doesn't work when it encounters that same data that's now serialized in V2 format. – jro Jan 24 '13 at 15:32
  • V2 data should deserialise into your V2 object. V1 is legacy from my understanding and you just need a way to migrate from V1 -> V2 – Paul Farry Jan 24 '13 at 22:38