- decode received string(encoded using base64) to string
- change decoded string to byte[]
Since you haven't provided code, it's hard to tell for sure, but I suspect these steps look something like this:
String encoded = readMessage();
String decoded = new String(base64Decode(encoded), "ASCII");
byte[] bytes = decoded.getBytes("UTF-8");
This is wrong, because what you are actually doing here is interpreting the decoded bytes as an ASCII string, and then converting them back to bytes with UTF-8 encoding. I believe this will cause exactly the corruption you describe: if you look at the byte pattern, you'll see that the original message has exactly one byte which has the top bit set (making it invalid ASCII), and in the corrupted message that byte has been replaced by the UTF-8 encoding of the Unicode replacement character (used to mark errors). Your string data is not corrupted because it contains only ASCII, and the Protobuf metadata was not corrupted because none of your field numbers are greater than 15 and therefore the tags are all 1 byte without the high bit set (which makes them valid ASCII).
In any case, the String
class cannot contain raw bytes. It can only contain Unicode text. So it is wrong for the output of base64Decode()
to ever be represented as a String
. It must only be byte[]
. That is, it should look like:
String encoded = readMessage();
byte[] decoded = base64Decode(encoded);