proto3 - oneof vs fields with identifier

Question

I am writing a proto3 class for an object which currently have around 2 variations, and will grow up to 6 or 7. Only one of them would be used in a message. These variations do not share common fields. They will be encoded as a submessage in a parent message. These messages would be written once and read tens of thousands of time.

I was wondering what would be the most performant way, memory and time of parsing wise, to achieve this so that as more variations are added, the performance is not lost.

Consider the following variations.

message B1 {
    repeated string value = 1;
    bool hasMeta = 2;
}

message B2 {
    repeated int32 value = 1;
    map<string, string> foo = 2;
}

First option: define an oneof field that refers the specific subtype.

message P1 {
    oneof parents {
        B1 boo = 1;
        B2 baz = 2;
        // add more variations here in future..
    }
    // other non-related fields...
}

Second option: define an integer that acts as an identifier for the available variation. At runtime this integer can be used to determine which variation has been set (another way is to null check the variations and use the first non-null).

message P1 {
    int32 type = 1;
    B1 boo = 2;
    B2 baz = 3;
    // other non-related fields...
}

I am particularly interested in the wire size and performance.

In second option, considering only one of the variations would be set (enforced in app layer), will the wire size be more than that in first? Is memory reserved for null fields as well?

score 2 · Accepted Answer · answered Mar 24 '20 at 11:56

The oneof method is slightly better compared to the message where you define the variable type with respect to processing power and wire size. Protobuf always serializes the tag number before a nested message. So for the oneof message it is not required to serialize a variable like type. Making it's wire size slightly smaller compared to the second message definition.

With respect to memory allocation this highly depends on the programming language you are using and how they have implemented oneof's and nested messages. If I am not mistaken the default C++ implementation dynamically allocates memory for sub messages. I suspect no difference here between either of your suggestions. Looking at NanoPB however, there oneof's are implemented as unions allocating only memory for the bigger message. This while for your second option would allocated memory for both B1 and B2.

This is a java application and I'm using Google's protobuf-java library — Termin4t0r, Mar 25 '20 at 19:00
Unfortunately I have no knowledge of the java implementation. Maybe you could look in the code or ask around specifically on how nested messages are allocated vs oneof's. — Bart, Mar 26 '20 at 08:59
OP I'm also using Java, did you manage to look into the memory usage? — Walton, Jan 12 '21 at 23:13

proto3 - oneof vs fields with identifier

1 Answers1