In the Protocol Buffers Version 3 Language Specification
The EBNF syntax for an option
is
option = "option" optionName "=" constant ";"
optionName = ( ident | "(" fullIdent ")" ) { "." ident }
constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ] floatLit ) | strLit | boolLit
ident = letter { letter | decimalDigit | "_" }
fullIdent = ident { "." ident }
strLit = ( "'" { charValue } "'" ) | ( '"' { charValue } '"' )
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
hexEscape = '\' ( "x" | "X" ) hexDigit hexDigit
octEscape = '\' octalDigit octalDigit octalDigit
charEscape = '\' ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | '\' | "'" | '"' )
Or in plain English, an option
may be assigned a dotted.notation.identifier, an integer, a float, a boolean, or a single- or double-quoted string, which MUST NOT have "raw" newline characters.
And yet, I'm encountering .proto files in various projects such as grpc-gateway and googleapis, where the rhs of the assignment is not quoted and spans multiple lines. For example in googleapis/google/api/http.proto there is this service
definition in a comment block:
// service Messaging {
// rpc UpdateMessage(Message) returns (Message) {
// option (google.api.http) = {
// patch: "/v1/messages/{message_id}"
// body: "*"
// };
// }
// }
In other files, the use of semicolons (and occasionally commas) as separators seems somewhat arbitrary, and I have also seen keys repeated, which in JSON or JavaScript would result in loss of data due to overwriting.
Are there any canonical extensions to the language specification, or are people just Microsofting? (Yes, that's a verb now.)