0

In the Protocol Buffers Version 3 Language Specification

The EBNF syntax for an option is

option = "option" optionName  "=" constant ";"
optionName = ( ident | "(" fullIdent ")" ) { "." ident }
constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ] floatLit ) | strLit | boolLit 
ident = letter { letter | decimalDigit | "_" }
fullIdent = ident { "." ident }
strLit = ( "'" { charValue } "'" ) |  ( '"' { charValue } '"' )
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
hexEscape = '\' ( "x" | "X" ) hexDigit hexDigit
octEscape = '\' octalDigit octalDigit octalDigit
charEscape = '\' ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | '\' | "'" | '"' )

Or in plain English, an option may be assigned a dotted.notation.identifier, an integer, a float, a boolean, or a single- or double-quoted string, which MUST NOT have "raw" newline characters.

And yet, I'm encountering .proto files in various projects such as grpc-gateway and googleapis, where the rhs of the assignment is not quoted and spans multiple lines. For example in googleapis/google/api/http.proto there is this service definition in a comment block:

//     service Messaging {
//       rpc UpdateMessage(Message) returns (Message) {
//         option (google.api.http) = {
//           patch: "/v1/messages/{message_id}"
//           body: "*"
//         };
//       }
//     }

In other files, the use of semicolons (and occasionally commas) as separators seems somewhat arbitrary, and I have also seen keys repeated, which in JSON or JavaScript would result in loss of data due to overwriting.

Are there any canonical extensions to the language specification, or are people just Microsofting? (Yes, that's a verb now.)

Jeff
  • 2,095
  • 25
  • 18

1 Answers1

0

I posted a similar question on the Protocol Buffers Google Group, and received a private message from a fellow at Google stating the following

This syntax is correct and valid for setting fields on a proto option field which is itself a field referencing a message type. This form is based on the TextFormat spec which I'm unclear if its super well documented, but here's an implementation of it: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format

When I have time, I will try to unpack what I learn from analyzing TextFormat.

update I received an answer on the Groups forum

I think for better or worse, "what protoc implements" takes precedence over whatever the spec says. The spec came later and as far as I know we have not put a lot of effort into ensuring that it comprehensively matches the format that protoc expects. I believe the syntax you are looking at is missing from the .proto file format spec but is mentioned here as the "aggregate syntax."

The link above is to a section titled Custom Options in the Language Guide (proto2) page. If you scroll all the way to the end of that section, there is the following snippet that mentions TextFormat:

message FooOptions {
  optional int32 opt1 = 1;
  optional string opt2 = 2;
}

extend google.protobuf.FieldOptions {
  optional FooOptions foo_options = 1234;
}

// usage:
message Bar {
  optional int32 a = 1 [(foo_options).opt1 = 123, (foo_options).opt2 = "baz"];
  // alternative aggregate syntax (uses TextFormat):
  optional int32 b = 2 [(foo_options) = { opt1: 123 opt2: "baz" }];
}
Jeff
  • 2,095
  • 25
  • 18