5

I need to store big data (in the order of Giga bytes) to stream using protobuf-net 2.4.0.

Currently, I use the strategy of writing a small header with LengthPrefix with PrefixStyle.Base128 and the big body with the standard protobuf serialization approach with the below code, and it works like a charm.

private void Serialize(Stream stream)
{
    Model.SerializeWithLengthPrefix(stream, FileHeader, typeof(FileHeader), PrefixStyle.Base128, 1);

    if (FileHeader.SerializationMode == serializationType.Compressed)
    {                
        using (var gzip = new GZipStream(stream, CompressionMode.Compress, true))
        using (var bs = new BufferedStream(gzip, GZIP_BUFFER_SIZE))
        {
            Model.Serialize(bs, FileBody);
        }
    }
    else
        Model.Serialize(stream, FileBody);
}

Now I need to split the body into 2 different objects, so I have to use the LengthPrefix approach for them too, but I don't know what is the best PrefixStyle in such a scenario. Can I continue to use Base128? What does "useful for compatibility" mean in Fixed32 description?

UPDATE

I found this post where Marc Gravell explains that there is an option to use a start-marker and an end-marker, but I'm not sure if it can be used with the LengthPrefix approach or not. To be more clear, is a valid approach the one shown in the below code?

[ProtoContract]
public class FileHeader
{
    [ProtoMember(1)]
    public int Version { get; }
    [ProtoMember(2)]
    public string Author { get; set; }
    [ProtoMember(3)]
    public string Organization { get; set; }
}

[ProtoContract(IsGroup = true)] // can IsGroup=true help with LengthPrefix for big data?
public class FileBody1
{
    [ProtoMember(1), DataFormat = DataFormat.Group)]
    public List<Foo1> Foo1s { get; }
    [ProtoMember(2), DataFormat = DataFormat.Group)]
    public List<Foo2> Foo2s { get; }
    [ProtoMember(3), DataFormat = DataFormat.Group)]
    public List<Foo3> Foo3s { get; }
}

[ProtoContract(IsGroup = true)] // can IsGroup=true help with LengthPrefix for big data?
public class FileBody2
{
    [ProtoMember(1), DataFormat = DataFormat.Group)]
    public List<Foo4> Foo4s { get; }
    [ProtoMember(2), DataFormat = DataFormat.Group)]
    public List<Foo5> Foo5s { get; }
    [ProtoMember(3), DataFormat = DataFormat.Group)]
    public List<Foo6> Foo6s { get; }
}

public static class Helper
{
    private static void SerializeFile(Stream stream, FileHeader header, FileBody1 body1, FileBody2 body2)
    {
        var model = RuntimeTypeModel.Create();

        var serializationContext = new ProtoBuf.SerializationContext();

        model.SerializeWithLengthPrefix(stream, header, typeof(FileHeader), PrefixStyle.Base128, 1);
        model.SerializeWithLengthPrefix(stream, body1, typeof(FileBody1), PrefixStyle.Base128, 1, serializationContext);
        model.SerializeWithLengthPrefix(stream, body2, typeof(FileBody2), PrefixStyle.Base128, 1, serializationContext);
    }

    private static void DeserializeFile(Stream stream, ref FileHeader header, ref FileBody1 body1, ref FileBody2 body2)
    {
        var model = RuntimeTypeModel.Create();

        var serializationContext = new ProtoBuf.SerializationContext();

        header = model.DeserializeWithLengthPrefix(stream, null, typeof(FileHeader), PrefixStyle.Base128, 1) as FileHeader;
        body1 =  model.DeserializeWithLengthPrefix(stream, null, typeof(FileBody1), PrefixStyle.Base128, 1, null, out _, out _, serializationContext) as FileBody1;
        body2 =  model.DeserializeWithLengthPrefix(stream, null, typeof(FileBody2), PrefixStyle.Base128, 1, null, out _, out _, serializationContext) as FileBody2;
        
    }
}

If so, I suppose that I can continue to store big data without worrying about the prefix-length (I mean the marker indicating the length of the message)

ilCosmico
  • 1,319
  • 15
  • 26

1 Answers1

1

Base128 is probably the best general-purpose choice, simply because it maintains protocol compatibility (the others: do not). What I would suggest, though, is that for very large files, using "group" mode on the collections (and sub-objects in general) may be highly desirable; this makes serialization faster, by virtue of not having to calculate any length-prefixes for large object graphs.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • As far as I can understand, my approach is valid and good for big data. I just updated the code with your last hint. Is it what you meant? – ilCosmico Sep 27 '22 at 10:07