3

We using protobuf v.3 to transfer messages from C# client to Java server over HTTP.

The message proto looks like this:

message CLIENT_MESSAGE {
    string message = 1;
}

Both client and server uses UTF-8 character encoding for strings.

Everything is fine whe we are using short string values like "abc", but when we trying to transfer string with 198 chars in it, we catchig an Exception:

   com.google.protobuf.InvalidProtocolBufferException: 
    While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either that the input has been truncated or that an embedded message misreported its own length.

We tried to compare even byte array containing protobuf data, and didn't found a solution. For "aaa" string byte array starts with this bytes:

10 3 97 97 97

Where 10 is protobuf field number, and 3 is string length, 69 65 67 is "aaa".

For string

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

which contains 198 characters in it, byte array starts with this:

10 198 1 97 97 97....

Where 10 is protobuf field number, and 198 is string length, and 1 seems to be like string identifier, or what?

And why protobuf cannot parse this message?

Already spent almost a day on looking for solution for this problem, any help appreciated.

UPDATE:

We made dumps both from client and server, and what is weird - the dumps is different!

Protobuf dump from client, before sending to server:

00000000   0A C6 01 61 61 61 61 61  61 61 61 61 61 61 61 61   ·Æ·aaaaaaaaaaaaa
00000010   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000020   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000030   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000040   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000050   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000060   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000070   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000080   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00000090   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
000000A0   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
000000B0   61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
000000C0   61 61 61 61 61 61 61 61  61                        aaaaaaaaa  

Protobuf dump which server receives:

0000: 0A EF BF BD 01 61 61 61 61 61 61 61 61 61 61 61   .....aaaaaaaaaaa
0010: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0020: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0030: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0040: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0050: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0060: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0070: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0080: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
0090: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00A0: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00B0: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61   aaaaaaaaaaaaaaaa
00C0: 61 61 61 61 61 61 61 61 61 61 61                   aaaaaaaaaaa

As you can see, the protobuf data headers are different... Thats totally breaking my mind, how could that happens?

UPDATE2: we made a research, and found that this problem happens only with strings longer than 128 symbols. If string consist from 128 symbols, or lesser - there is no problem.

NewJ
  • 379
  • 3
  • 15
  • Update: we made a research, and found that this problem happens only with strings longer than 128 symbols. If string consist from 128 symbols, or lesser - the problem appears. – – NewJ May 17 '18 at 10:47

2 Answers2

6

Well, finally the problem was in characters encoding - we tried to convert binary protobuf data to string.

If you need to transfer binary protobuf data as a string - encode it to base64 on client first, and decode from base 64 on server then.

Thanks @Marc Gravell for help

NewJ
  • 379
  • 3
  • 15
2

Where 10 is protobuf field number,

Yes; field 1, length-prefixed.

and 198 is string length, and 1 seems to be like string identifier, or what?

The 198 1 is the string length, encoded with "varint" encoding; this computes as the integer 198, but takes two bytes to encode.

And why protobuf cannot parse this message?

We'd need to see the rest of the bytes; the library could be very correct if you don't have all the bytes. Do you have all the bytes for the failing case, perhaps as hex or base-64?

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Hello Marc, and thanks for answer. Full bytes send from client looks like this: https://pastebin.com/092suw39 – NewJ May 17 '18 at 10:03
  • @NewJ the total buffer here should be 201 bytes, note; if the buffer you're giving it isn't this length: that'll be the problem – Marc Gravell May 17 '18 at 10:06
  • @NewJ ah, I didn't see the edit with the pastebin data; yes, that data looks correct and decodes correctly for me. So; the next question is: what is the exact data ***including the length*** that the server is processing? is it *also* 201 bytes of the exact same data? – Marc Gravell May 17 '18 at 10:25
  • Marc, we made dumps both for client and server, and they differs! Dump data: https://pastebin.com/Zr6q47NT I totally have no idea why that happens, we already checked library versions both on client and server again – NewJ May 17 '18 at 10:35
  • we made a research, and found that this problem happens only with strings longer than 128 symbols. If string consist from 128 symbols, or lesser - there is no problem. – NewJ May 17 '18 at 10:48
  • 1
    @NewJ look at the start of the file; the client sent "0A C6 01 61xlots" the server received "0A EF BF BD 01 61xlots"; until the server receives the data you sent, nothing else will work - you corrupted the data in transmission.So: *why* did the server get that? my guess would be that you've text-encoded the binary, which would be very wrong. – Marc Gravell May 17 '18 at 11:00
  • 1
    @NewJ I ran a sweep of all text encodings I can think of, and none would produce `0A EF BF BD 01 61` from `0A C6 01 61`, so ... not sure what you've done exactly, but somewhere between serializing at the client and deserializing at the server: the data is different. So: until the data at the server **matches what the client thinks it sent**: all bets are off. I can't see that code, so I can't really speculate what you've done. – Marc Gravell May 17 '18 at 11:07
  • Marc, thanks for the answer again, you helped us a lot. We found, that protobuf message "corrupts" after encoding on client. I uploaded a client code text with explanation comments: https://pastebin.com/Aky0KXkA Do you have an idea how could we fix that? We using UnityWebRequest class to send HTTP Post requests to server – NewJ May 17 '18 at 11:31
  • It seems that protobuf data corrupts after converting to UTF-8 string here: string requestData = Encoding.UTF8.GetString(protoBytes); – NewJ May 17 '18 at 11:33
  • 2
    @NewJ well, yes; that line is simply wrong - it is using a text encoding *backwards*. That isn't a valid way to convert arbitrary binary (meaning: not encoded text) to a string. What you need is something like hex or base-64. I'd go with base-64 (it'll be shorter), in which case: `string base64 = Convert.ToBase64String(protoBytes);`. Just about every framework has an pre-built API to encode or decode base-64. – Marc Gravell May 17 '18 at 11:42