20

Is it possible to detect the type of a raw protocol buffer message (in byte[])

I have a situation where an endpoint can receive different messages and I need to be able to detect the type before I can deserialize it.

I am using protobuf-net

Yavor Shahpasov
  • 1,453
  • 1
  • 12
  • 19

4 Answers4

19

You can't detect the type in isolation, since the protobuf spec doesn't add any data to the stream for this; however, there are a number of ways of making this easy, depending on the context:

  • a union type (as mentioned by Jon) covers a range of scenarios
  • inheritance (protobuf-net specific) can be versatile - you can have a base-message type, and any number of concrete message types
  • you can use a prefix to indicate the incoming type

the last approach is actually very valuable in the case of raw TCP streams; this is on the wire identical to the union type, but with a different implementation; by deciding in advance that 1=Foo, 2=Bar etc (exactly as you do for the union type approach), you can use SerializeWithLengthPrefix to write (specifying the 1/2/etc as the field number), and the non-generic TryDeserializeWithLengthPrefix to read (this is under Serializer.NonGeneric in the v1 API, or on the TypeModel in the v2 API), you can provide a type-map that resolves the numbers back to types, and hence deserialize the correct type. And to pre-empt the question "why is this useful with TCP streams?" - because: in an ongoing TCP stream you need to use the WithLengthPrefix methods anyway, to avoid over-reading the stream; so you might as well get the type identifier for free!

summary:

  • union type: easy to implement; only down side is having to then check which of the properties is non-null
  • inheritance: easy to implement; can use polymorphism or discriminator to handle "what now?"
  • type prefix: a bit more fiddly to implement, but allows more flexibility, and has zero overhead on TCP streams
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Hi thanks for the detailed reply. I've managed to accomplish the desired result using Serializer.SerializeWithLengthPrefix & Serializer.NonGeneric.TryDeserializeWithLengthPrefix I would like to use the RuntimeTypeModel to avoid having attributes on my classes. The model.SerializeWithLengthPrefix seems to work as expected but I could not find an equivallent of TryDeserializeWithLengthPrefix on the RuntimeTypeModel. The model.DeserializeWithLengthPrefix seems to expect a type. How should I accomplish the equivalent of TryDeserializeWithLengthPrefix using the model. Using version 2.0.0.480 nuget – Yavor Shahpasov Feb 03 '12 at 10:18
  • I've changed my code to add to add model definitions to RuntimeTypeModel.Default and I can use the TryDeserializeWithLengthPrefix without attributes. Before I was creating a new model using TypeModel.Create So problem solved Thanks – Yavor Shahpasov Feb 03 '12 at 11:11
  • @Yavor on TypeModel, it is just DeserializeWithLengthPrefix, along with a TypeResolver. Not sure where the Try went! I'll look for it behind the cushions on the sofa - that is where most things end up. – Marc Gravell Feb 03 '12 at 13:51
13

One typical option is to have a wrapper message to act as an "option type" or discriminated union. You could have an enum (one per message type) and a message containing a field with the message type in, and then one optional field per message type.

This is described in the Protobuf documentation as a "union type".

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
7

You could wrap it like this. Where data would hold the actual message.

message MyCustomProtocol {
  required int32 protocolVersion = 1;
  required int32 messageType = 2;
  bytes data = 3;
}

A general rule for protocols is to include a protocol version. You will be very happy to have it once you have old and new clients.

Darwin
  • 4,686
  • 2
  • 30
  • 22
2

You could use a technique called Self Describing Messages. It can be used to generate a set of .proto files describing each message type encoded as 'any' within a wrapper. An example from the docs:

syntax = "proto3";

import "google/protobuf/any.proto";
import "google/protobuf/descriptor.proto";

message SelfDescribingMessage {
  // Set of FileDescriptorProtos which describe the type and its dependencies.
  google.protobuf.FileDescriptorSet descriptor_set = 1;

  // The message and its type, encoded as an Any message.
  google.protobuf.Any message = 2;
}

It should be noted that native support for these messages at the time of writing this response is only available in C++ and Java.

Eelke van den Bos
  • 1,423
  • 1
  • 13
  • 18