Is there a way to mark the end of each protobuf-net record

Question

I am saving a series of protobuf-net objects in a database cell as a Byte[] of length-prefixed protobuf-net objects:

//retrieve existing protobufs from database and convert to Byte[]
object q = sql_agent_cmd.ExecuteScalar();
older-pbfs = (Byte[])q;

// serialize the new pbf to add into MemoryStream m
//now write p and the new pbf-net Byte[] into a memory stream and retrieve the sum

var s = new System.IO.MemoryStream();
s.Write(older-pbfs, 0, older-pbfs.Length);
s.Write(m.GetBuffer(), 0, m.ToArray().Length); // append new bytes at the end of old
Byte[] sum-pbfs = s.ToArray();

//sum-pbfs = old pbfs + new pbf. Insert sum-pbfs into database

This works fine. My concern is what happens if there is slight db corruption. It will no longer be possible to know which byte is the length prefix and the entire cell contents would have to be discarded. Wouldn't it be advisable to also use some kind of a end-of-pbf-object indicator (kind of like the \n or EOF indicators used in text files). This way even if a record gets corrupted, the other records would be recoverable.

If so, what is the recommended way to add end-of-record indicators at the end of each pbf.

Using protobuf-netv2 and C# on Visual Studio 2010.

Thanks Manish

score 3 · Accepted Answer · answered May 15 '12 at 19:27

If you use a vanilla message via Serialize / Deserialize, then no: that isn't part of the specification (because the format is designed to be appendable).

If, however, you use SerializeWithLengthPrefix, it will dump the length at the start of the message; it will then know in advance how much data is expected. You deserialize with DeserializeWithLengthPrefix, and it will complain loudly if it doesn't have enough data. However! It will not complain if you have extra data, since this too is designed to be appendable.

In terms of Jon's reply, the default usage of the *WithLengthPrefix method is in terms of the data stored exactly identical to what Jon suggests; it pretends there is a wrapper object and behaves accordingly. The differences are:

no wrapper object actually exists
the "withlengthprefix" methods explicitly stop after a single occurrence, rather than merging any later data into the same object (useful for, say, sending multiple discreet objects to a single file, or down a single socket)

The difference in the two "appendable"s here is that the first means "merge into a single object", where-as the second means "I expect multiple records".

Unrelated suggestion:

s.Write(m.GetBuffer(), 0, m.ToArray().Length);

should be:

s.Write(m.GetBuffer(), 0, (int)m.Length);

(no need to create an extra buffer)

Thanks very much. A related question. Does Serializer have a method that can be used to send the Byte[] sum-pbfs across the network. The standard mechanism seems to be to loop through the Byte[] until you reach the end. But something like Serializer.send(clientStream, sum-pbfs) would be great. — Manish, May 20 '12 at 10:56
@Manish I don't understand the question; what are "sum-pbfs"? If using the WithLengthPrefix methods it doesn't read bast a single message. You can also create a fixed-length ProtoReader IIRC, and use that - or there may be a deserialize overload that accepts a fixed length (can't recall, but definitely very easily done on the public API) — Marc Gravell, May 20 '12 at 11:55
My apologies for the lack of clarity. sum-pbfs was in the code snippet above, it is essentially a Byte[] of serialized pbfs. All examples of sending a byte[] through a socket suggest that we loop through it 2kb at a time. I was hoping there might be a simpler way of transmitting the Byte[]. But after your comments [here](http://stackoverflow.com/questions/10672895/interleave-protobuf-net-and-file), I think I will send one pbf at a time anyway. Thanks. — Manish, May 20 '12 at 21:59

score 2 · Answer 2 · answered May 15 '12 at 19:04

(Note: I don't know much about protobuf-net itself, but this is generally applicable to Protocol Buffer messages.)

Typically if you want to record multiple messages, it's worth just putting a "wrapper" message - make the "real" message a repeated field within that. Each "real" message will then be length prefixed by the natural wire format of Protocol Buffers.

This won't detect corruption of course - but to be honest, if the database ends up getting corrupted you've got bigger problems. You could potentially detect corruption, e.g. by keeping a hash along with each record... but you need to consider the possibility of the corruption occurring within the length prefix, or within the hash itself. Think about what you're really trying to achieve here - what scenarios you're trying to protect against, and what level of recovery you need.

Thanks very much for the feedback. I finally found a way to see if protobuf-net allows repeated fields. It does not appear so. I can see an option of IsRepeated = True for a field, but not an IsRepeated = True. — Manish, May 20 '12 at 10:50
@Manish: It definitely handles repeated fields. I'd expect them to be modelled as collections. — Jon Skeet, May 20 '12 at 12:29

Is there a way to mark the end of each protobuf-net record

2 Answers2