Here on java side can I deserialized the stream , so that I can find out there are 3 fields in the stream and its respective name
, type
, and value
You will need to know the schema in advance. Firstly, protobuf does not transmit names; all it uses as identifiers is the numeric key (1
, 2
and 3
in your example) of each field. Secondly, it does not explicitly specify the type; there are only a very few wire-types in protobuf (varint, 32-bit, 64-bit, length-prefix, group); actual data types are mapped onto those, but you cannot unambiguously decode data without the schema
- varint is "some form of integer", but could be signed, unsigned or "zigzag" (which allows negative numbers of small magnitude to be cheaply encoded), and could be intended to represent any width of data (64 bit, 32 bit, etc)
- 32-bit could be an integer, but could be signed or unsigned - or it could be a 32-bit floating-point number
- 64-bit could be an integer, but could be signed or unsigned - or it could be a 64-bit floating-point number
- length-prefix could be a UTF-8 string, a sequence or raw bytes (without any particular meaning), a "packed" set of
repeated
values of some primitive type (integer, floating point, etc), or could be a structured sub-message in protobuf format
- groups - hoorah! this is always unambigous! this can only mean one thing; but that one thing is largely deprecated by google :(
So fundamentally: you need the schema. The encoded data does not include what you want. It does this to avoid unnecessary space - if the protocol assumes that the encoder and decoder both know what the message is meant to look like, then a lot less information needs to be sent.
Note, however, that the information that is included is enough to safely round-trip a message even if there are fields that are not expected; it is not necessary to know the name or type if you only need to re-encode it to pass it along / back.
What you can do is use the parser API to scan over the data to reveal that there are three fields, field 1 is a varint, field 2 is length-prefixed, field 3 is length-prefixed. You could make educated guesses about the data beyond that (for example, you could see whether a UTF-8 decode produces something that looks roughly text-like, and verify that UTF-8 encoding that gives you back the original bytes; if it does, it is possible it is a string)