34

I'm trying to determine the relationship between default values and the has_foo() methods that are declared in various programmatic interfaces. In particular, I'm trying to determine under what circumstances (if any) you can "tell the difference" between a field explicitly set to the default value, and an unset value.

  1. If I explicitly set a field (e.g. "Bar.foo") to its default value (e.g., zero), then is Bar::has_foo() guaranteed return true for that data structure? (This appears to be true for the C++ generated code, from a quick inspection, but that doesn't mean it's guaranteed.) If this is true, then it's possible to distinguish between an explicitly set default value and an unset prior to serialization.

  2. If I explicitly set a field to its default value (e.g., zero), and then serialize that object and send it over the wire, will the value be sent or not? If it is not, then clearly any code that receives this object can't distinguish between an explicitly set default value and an unset value. I.e., it won't be possible to distinguish these two cases after serialization -- Bar::has_foo() will return false in both cases.

If it's not possible to tell the difference, what is the recommended technique for encoding a protobuf field if I want to encode a "nullable" optional value? A couple options come to mind, but neither seem great: (a) add an extra boolean field that records whether the field is set or not, or (b) use a "repeated" field even though I semantically want an optional field -- this way I can tell the difference between no value (length-zero list) or a set value (length-one list).

Edward Loper
  • 15,374
  • 7
  • 43
  • 52

1 Answers1

36

The following applies for 'proto2' syntax, not 'proto3' :

The notion of a field being set or not is a core feature of Protobuf. If you set a field to a value (any value), then the corresponding has_xxx method must return true, otherwise you have a bug in the API.

If you do not set a field and then serialize the message, no value is sent for that field. The receiving side will parse the message, discover which values where included, and set the corresponding "has_xxx" values.

Exactly how this is implemented in the wire-format is documented here: http://code.google.com/apis/protocolbuffers/docs/encoding.html. The short version is that message are encoded as a sequence of key-value pairs, and only fields which are explicitly set are included in the encoded message.

Default values only come into play when you attempt to read an unset field.

Alex Garcia
  • 773
  • 7
  • 21
JesperE
  • 63,317
  • 21
  • 138
  • 197
  • Thanks. But to be clear: (1) If I *explicitly* set a field to a value (even its default value), then has_xxx is true; and (2) if I serialize a protobuf object, then a field will be sent over the wire if has_xxx is true, even if the field's value is the default value? – Edward Loper Feb 09 '12 at 14:38
  • 3
    I don't think that's correct. It doesn't say anywhere in the spec documentation you linked that: (1) A member should be encoded anyway if its value matches the default. (2) The protobuf implementation must allow you to distinguish between an encoded default value and a default value set during decoding. It's extra bytes to send an empty tag + id over the wire when the definition on the other end already has a suitable default. Further the documentation even states "If any of your elements are optional, the encoded message may or may not have a key-value pair with that tag number." – shanna Aug 29 '13 at 12:18
  • 10
    The behavior changed between proto2 and proto3. Under proto3, there is no longer a separate notion of "presence". A field is sent on the wire if and only if it is not equal to its default value. (The exception is message-typed fields, which still follow the proto2 behavior.) – Kenton Varda May 05 '16 at 17:27
  • @KentonVarda So `has_field` is now essentially `is_field_equal_to_default`? – Tobi Akinyemi Jan 16 '21 at 03:34
  • 1
    @TobiAkinyemi Sorry, I don't totally know. I created proto2 but I've never used proto3, only heard things. I'd suggest looking at the generated code and seeing what it says. – Kenton Varda Jan 17 '21 at 20:12