maximum field number in protobuf message

Question

The official document for protocol buffers https://developers.google.com/protocol-buffers/docs/proto3 says the maximum field number for fields in protobuf message is 2^29-1. But why is this limit? Please anyone can explain in some detail? I am newbie to this.

I read answers to the this question at why 2^29-1 is the biggest key in protocol buffers. But I am not clarified

Maik · Answer 1 · 2020-03-29T19:16:49.500

Each field in an encoded protocol buffer has a header (called key or tag) prefixed to the actual encoded value. The encoding spec defines this key:

Each key in the streamed message is a varint with the value (field_number << 3) | wire_type – in other words, the last three bits of the number store the wire type.

Here the spec says the tag is a varint where the first 3 bits are used to encode the wire type. A varint could encode a 64 bit value, thus just by going on this definition the limit would be 2^61-1.

In addition to this, the Language Guide narrows this down to a 32 bit value at max.

The smallest field number you can specify is 1, and the largest is 2^29 - 1, or 536,870,911.

The reasons for this are not given. I can only speculate for the reasons behind this:

Artificial limit as no one is expecting a message to have that many fields. Just think about fitting a message with that many fields into memory.
As the key is a varint, it isn't simply the next 4 bytes in the raw buffer, rather a variable length of bytes (Java code reading a varint32). Each byte has 7 bit of actual data and 1 bit indicating if the end is reached. It cloud be that for performance reasons it was deemed to be better to limit the range.
Since proto3 is the 3rd version of protocol buffers, it could be that either proto1 or proto2 defined the tag to be a varint32. To keep backwards compatibility this limit is still true in proto3 today.

score 1 · Answer 2 · answered Aug 31 '20 at 07:32

Because of this line:

#define GOOGLE_PROTOBUF_WIRE_FORMAT_MAKE_TAG(FIELD_NUMBER, TYPE) \
  static_cast<uint32>((static_cast<uint32>(FIELD_NUMBER) << 3) | (TYPE))

this line create a "tag", which left only 29 (32 - 3) bits to save field indice.

Don't know why google use uint32 instead of uint64 though, since field number is a varint, may be they think 2^29-1 fields is large enough for a single message declaration.

Marc Gravell · Answer 3 · 2019-08-16T08:30:58.103

0

I suspect this is simply so that a field-header (wire-type and tag-number) can be decoded and handled as a 32-bit value. The wire-type is always the 3 least significant bits, leaving 29 bits for the tag number. Technically "varint" should support 64 bits, but it makes sense to limit it to reasonable numbers, not least because "varint" encoding means that larger numbers take more bytes to encode.

Edit: I realise now that this is similar to the linked post, but... it remain true! Each field in protobuf is prefixed by a "varint" that expresses what field (tag-number) follows, and what data type it is (wire-type). The latter is important especially so that unexpected fields (version differences) can be stored or skipped correctly. It is convenient for that field-header to be trivially processed by most frameworks, and most frameworks are fine with 32-bit integers.

edited Aug 16 '19 at 08:30

answered Aug 16 '19 at 08:17

Marc Gravell

1,026,079
266
2,566
2,900

is tag-number same as field number? – neha deshpande Aug 16 '19 at 09:24
is there any specific reason to use 32-bit value for field header? – neha deshpande Aug 16 '19 at 09:45
@nehadeshpande yes, I already said: a) it works well on all platforms, b) it is efficient computationally, c) it prevents field headers becoming unnecessarily large – Marc Gravell Aug 16 '19 at 10:18
@nehadeshpande for contrast to 64-bit: platforms that only deal with numbers via IEEE (Lua, for example): can't represent all 64-bit integers correctly – Marc Gravell Aug 16 '19 at 10:19
can you help me with one more thing if possible? I have defined message with int32 format with max field number 2^29-1. The output I get after encoding the message is b'\xf8\xff\xff\xff\x0f\x01'. I am not able to understand what each byte here represents – neha deshpande Aug 16 '19 at 10:52

score 0 · Answer 4 · answered Sep 22 '20 at 10:59

this is another question rather a comment, in the document it says,

Field numbers in the range 16 through 2047 take two bytes. So you should reserve the numbers 1 through 15 for very frequently occurring message elements. Remember to leave some room for frequently occurring elements that might be added in the future.

Because for the first byte, top 5 bits are used for field number, and bottom 3 bits for field type, isn't it that field number from 31 (because zero is not used) to 2047 take two bytes? (and I also guess the second bytes' lower 3 bits are used also for field type.. I'm in the middle of reading it, so I'll fix it when I know it)

maximum field number in protobuf message

4 Answers4