0

I have 2 protobuf messages that I am unable to decode unless I remove some bytes from the beginning and/or from the end.

First message in base64 is:

RQAAAPEmCgV0ZXN0MRAXGbj4KzKrldhBIAFCAQBKAQBRAAEAsPC/WQAAAAAAAPC/

which translates to this byte array:

45 00 00 00 f1 26 0a 05 74 65 73 74 31 10 17 19 b8 f8 2b 32 ab 95 d8 41 20 01 42 01 00 4a 01 00 51 00 01 00 b0 f0 bf 59 00 00 00 00 00 00 f0 bf

If I call Google's protobuf Parser.ParseFrom(<above byte array>) it fails with the exception "Protocol message contained a tag with an invalid wire type."

But if I skip the first 6 bytes and the last 7 (so basically 0a 05 74 65 73 74 31 10 17 19 b8 f8 2b 32 ab 95 d8 41 20 01 42 01 00 4a 01 00 51 00 01 00 b0 f0 bf 59 00) then it works.

Alternatively, if I call Parser.ParseDelimitedFrom(<stream of the complete byte array>) again, same story, unless I remove 4 bytes (not 6) this time from the beginning and 7 from the end (f1 26 0a 05 74 65 73 74 31 10 17 19 b8 f8 2b 32 ab 95 d8 41 20 01 42 01 00 4a 01 00 51 00 01 00 b0 f0 bf 59 00)

The second message is:

RgAAAPA3CgV0ZXN0MhAmGdimrTmrldhBIAFCAQBKAgABUfYOIizeJGRAWYxpKU8HvFFA

or

46 00 00 00 f0 37 0a 05 74 65 73 74 32 10 26 19 d8 a6 ad 39 ab 95 d8 41 20 01 42 01 00 4a 02 00 01 51 f6 0e 22 2c de 24 64 40 59 8c 69 29 4f 07 bc 51 40

which behaves the same as the first message except that for this one I don't need to remove any bytes from the end, only 6 and 4 from the beginning for Parser.ParseFrom and Parser.ParseDelimitedFrom respectively to work.

Can someone please explain to me what's going on? Does it have something to do with CodedInputStream?

Cosmin Ivan
  • 65
  • 1
  • 5

1 Answers1

0

The message has some framing applied around it. The protobuf standard does not specify framing, and in many cases (such as base64 strings) there is no strict need for any.

You should get specification of the framing from whoever wrote the other end of the communication.

There are some things we can infer:

  • First 4 bytes are some kind of message index number. They don't match the message length, and seem to increase by small values.
  • Next two bytes could be some kind of checksum. They don't match the message length either, so it is not a delimited message.
  • No idea about why the first message has extra 7 bytes at the end and the other one does not.

But these are not enough to write a reliable parser.

jpa
  • 10,351
  • 1
  • 28
  • 45
  • Thank you for looking into this! I just got a reply from the person who sent the payloads and apparently the message had to be decompressed using LZ4. There goes a couple of days of debugging... – Cosmin Ivan Apr 20 '22 at 10:53
  • @user3305815 Heh, funny how LZ4 preserves enough of the protobuf structure to be partially recognizable. – jpa Apr 20 '22 at 12:00
  • I know, right? completely confused me! – Cosmin Ivan Apr 20 '22 at 13:30