what's the right way to do polymorphism with protocol buffers?

Question

I'm trying to long-term serialize a bunch of objects related by a strong class hierarchy in java, and I'd like to use protocol buffers to do it due to their simplicity, performance, and ease of upgrade. However, they don't provide much support for polymorphism. Right now, the way I'm handling it is by having a "one message to rule them all" solution that has a required string uri field that allows me to instantiate the correct type via reflection, then a bunch of optional fields for all the other possible classes I could serialize, only one of which will be used (based on the value of the uri field). Is there a better way to handle polymorphism, or is this as good as I'm going to get?

Beware: in **proto3** extensions have been removed. – RickyA Mar 02 '16 at 13:10 — RickyA, Mar 02 '16 at 13:10

score 52 · Answer 1 · edited Dec 20 '17 at 22:11

In proto3 the extend keyword has been replaced. From the docs: If you are already familiar with proto2 syntax, the Any type replaces extensions.

syntax = "proto3";

import "google/protobuf/any.proto";

message Foo {
  google.protobuf.Any bar = 1;
}

But beware: Any is essentially a bytes blob. Most of the times it is better to use Oneof:

syntax = "proto3";

message A {
    string a = 1;
}

message B {
    string b = 1;
}

message Foo {
  oneof bar {
    A a = 1;
    B b = 2;
  }
}

score 37 · Answer 2 · answered Nov 03 '11 at 18:59

37

There are a few techniques for implementing polymorphism. I try to cover them all here: Protocol Buffer Polymorphism

My preferred approach uses nested extensions:

message Animal
{
    extensions 100 to max;

    enum Type
    {
        Cat = 1;
        Dog = 2;
    }

    required Type type = 1;
}

message Cat
{
    extend Animal
    {
        required Cat animal = 100; // Unique Animal extension number
    }

    // These fields can use the full number range.
    optional bool declawed = 1;
}

message Dog
{
    extend Animal
    {
        required Dog animal = 101; // Unique Animal extension number
    }

    // These fields can use the full number range.
    optional uint32 bones_buried = 1;
}

answered Nov 03 '11 at 18:59

Jon Parise

1,373
14
14

7

It seems to me (from Python experiments) that this approach still allows a message to use both extensions at the same time. So I don't see how this differs from the approach with optional message fields and type hint field, except for the more complex syntax. – MvG Feb 27 '13 at 11:43
1

Moreover, why do you need types here? Isn't it enough to use `HasExtension` to check which type the message is? – Ixanezis Apr 10 '14 at 06:27
Yes you can just use HasExtension to find out if a given Animal message contains a Cat or Dog extension. – Jamie Flournoy Jan 03 '15 at 00:22
3

It seems `required` is no longer supported for extensions – Andriy Drozdyuk Mar 04 '15 at 22:35
@drozzy furthermore, the [docu](https://developers.google.com/protocol-buffers/docs/proto?csw=1#extensions) page now says "This is a common source of confusion: Declaring an extend block nested inside a message type does not imply any relationship between the outer type and the extended type. In particular, the above example does not mean that Baz is any sort of subclass of Foo. All it means is that the symbol bar is declared inside the scope of Baz; it's simply a static member." – x29a Jun 15 '15 at 15:10
As @drozzy has pointed out, [this is no longer possible/valid](https://groups.google.com/d/topic/protobuf/EyR1EdDL5cs) as of 2.6. – Whymarrh Aug 19 '15 at 02:49
hi Jon Parise, thanks very much for your polymorphism document. I quite like option 2 discussed there (embed a pre-serialised payload) as the code plays out quite small and nice, but I'm wondering why you consider it inefficient? It seems to me that protocol buffers has to figure out which payload it is once, and then user code has to figure it out again to process it. Option 2 allows that figuring out to occur just once? thanks! – mattw Nov 04 '15 at 06:18
Has the new `oneof` feature changed this answer? – Kevin Krumwiede Mar 18 '16 at 08:07

score 5 · Answer 3 · answered May 06 '12 at 01:16

Jon's solution is correct and working but pretty weird (for me). But Protocol Buffers is quite simple, so You can do something like that:

enum Type {
    FOO = 0;
    BAR = 1;
  }

message Foo {
  required Type type = 1;
}

message Bar {
  required Type type = 1;
  required string text = 2;
}

Basically message Bar extends message Foo (from practical side of course). Implementation in Java is simple too:

Bar bar = Bar.newBuilder().setType(Type.BAR).setText("example").build();
byte[] data = bar.toByteArray();

----

Foo foo = Foo.parseFrom(data);
if(foo.getType() == Type.BAR){
   Bar bar = Bar.parseFrom(data);
   System.out.println(bar.getText());
}

I known, it's not an elegant solution, but it's simple and logical.

This is fine, if Foo and Bar belong to the same service. The reason to use message extensions is to allow clients to extend messages belonging to a service. So a storage service could export Foo and allow extensions, and then a client could decide that they also want to store Bar data inside that Foo. The storage service would publish foo.proto as part of its API definition, and the client would extend that with bar.proto that imports Foo from foo.proto. — Jamie Flournoy, Jan 03 '15 at 00:27

score 3 · Answer 4 · answered Jun 10 '10 at 21:58

3

Check out Extensions and Nested Extensions for a slightly cleaner way to do this.

answered Jun 10 '10 at 21:58

bbudge

1,127
6
7

score 0 · Answer 5 · answered Jun 10 '10 at 21:57

Have you considered using extensions? You could have your uri field determine the type to use and then just load the appropriate extensions. If you know your fields are mutually exclusive then you could reuse the field id between separate extensions.

You have to handle this all yourself because protocol buffers aren't designed to be self describing beyond a simple list of values. This is touched on in the google techniques page.

score 0 · Answer 6 · answered Oct 01 '15 at 13:55

A solution a little better, for me, that the @Łukasz Marciniak's answer.

If Bar extends Foo, simply write:

message Bar {
   optional Foo foo = 1;
   optional double aDouble = 2;
}
message Foo {
   optional string aString = 1;
}

So if Foo evolves only Foo message is modified.

what's the right way to do polymorphism with protocol buffers?

6 Answers6

Linked