Common practice for using protobufs over the wire, including by gRPC, is to length prefix protobuf messages into frames (e.g. like this) so that the decoder knows when one message stops and the next starts.
This seems unnecessary. According to the spec, a protobuf message is comprised of a sequence of tags followed by values:
message := (tag value)*
tag := (field << 3) bit-or wire_type
Once a tag is read, the length of the value is then known and the parser can parse the value in its entirety without needing more metadata. Thus, the length is only needed to figure out if at any given point there are more tags left to be parsed in the message.
An obvious non-length solution presents itself: null (0x0) termination. Field indices start at 1 so tag
can never be 0; a flat INT
encoding for field = 1
with a VARINT wire type produces 0b1000 = 8
and VARINT
encoding will always set the MSB on the first byte and thus will always begin with a nonzero byte. Thus, if:
- the parser is in between tag-value pairs and
- encounters a null byte
it follows that this byte is not part of the rest of the protobuf message and thus a corresponding action (such as terminating the message) may be taken.
All of this seems very obvious given the protobuf specification, so am I simply missing some detail that breaks this?
Another way of phrasing the question is if there's a case where you will get a 0x0 tag within a valid message? It appears that nanopb would use NULL termination for a while and only stopped due to issues with debugging broken encoders.