1

I am using Visual Studio 2005 with .Net20 version of protobuf-net r480.

I try to follow an example and serialize a class with string, enum, int, and byte[] data as below

[ProtoContract]
public class Proto_DemoMessage {
    public enum ESSCommandType : int {
        ClientServer = 0,
        Broadcast = 10,
        Assign = 11,
    }

    [ProtoMember(1, IsRequired = false)]
    public string Name;
    [ProtoMember(3, IsRequired = true)]
    public ESSCommandType CommandType;
    [ProtoMember(4, IsRequired = false)]
    public int Code;
    [ProtoMember(5, IsRequired = false)]
    public byte[] BytesData;

    public byte[] Serialize(){
        byte[] b = null;
        using (MemoryStream ms = new MemoryStream()) {
            Serializer.Serialize<Proto_DemoMessage>(ms, this);
            b = new byte[ms.Position];
            byte[] fullB = ms.GetBuffer();
            Array.Copy(fullB, b, b.Length);
        }
        return b;
    }

And give value to each field as below

Proto_DemoMessage inner_message = new Proto_DemoMessage();
inner_message.Name = "innerName";
inner_message.CommandType = Proto_DemoMessage.ESSCommandType.Broadcast;
inner_message.Code = 11;
inner_message.BytesData = System.Text.Encoding.Unicode.GetBytes("1234567890");

After calling inner_message.Serialize(), I write the result byte[] to a file. When I open the file in HEX mode to verify it, I found each byte in byte[] has a 00 padding behind it. The result is:

2A 14 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39 00 30 00

Is there something I did wrong? I appreciate for your help.

CodingBarfield
  • 3,392
  • 2
  • 27
  • 54
Marco2
  • 13
  • 4
  • You threw me with your `MemoryStream` code... you could just use `.ToArray()` there - it would be far more direct. But Cicada is entirely correct; you are including `BytesData` as UTF-16; which *has zeros* for ASCII-range characters. That is entirely self-inflicted, via the `BytesData = ` line – Marc Gravell May 30 '12 at 11:52

2 Answers2

3

Everything's OK. Your string is encoded in UTF-16. In this encoding all characters are (at least) two bytes wide.

user703016
  • 37,307
  • 8
  • 87
  • 112
  • Actually, there could be **two** problems! I take it back! The memory stream usage **and** bad encoding. Sorry. I made a mistake. I'm undeleting - my sincere apologies. – Marc Gravell May 30 '12 at 11:50
  • @MarcGravell No worries! But you really got me confused for a moment! :) – user703016 May 30 '12 at 11:52
  • yes, sorry about that. The... "unusual" MemoryStream copy code threw me, and I chased the wrong lead. Entirely my bad. – Marc Gravell May 30 '12 at 11:53
1

While I was checking if these zeroes are indeed because of Unicode encoding, I also checked that UTF8 is more compact (as it could be expected), so using

inner_message.BytesData = System.Text.Encoding.UTF8.GetBytes("1234567890");

might do some good.

upd: or really using a string property as per Marc Gravell's suggestion.

Eugene Ryabtsev
  • 2,232
  • 1
  • 23
  • 37
  • I would say: if you want to store a string such as "1234567890", then *have a string property*, and let the serializer worry about the encoding ;p – Marc Gravell May 30 '12 at 11:54
  • @Marc Gravell: or that. Is it totally automatically unicode-capable out of the box? How it encodes by default? – Eugene Ryabtsev May 30 '12 at 11:56
  • yes, the protobuf spec (by Google) explicitly states UTF-8 is to be used for character data, so a simple `string` would work fine, and would get encoded via UTF-8 as per your answer. But automatically. – Marc Gravell May 30 '12 at 11:57
  • Of course using a `string` would be better if the field is *actually meant* to be a string, yet per the semantics of the name `ByteData` I'd say these really are going to be *bytes* and OP was just testing if it worked. (Pure assumptions there). – user703016 May 30 '12 at 12:05
  • Yes, I was just testing support for byte[] datatype using string of characters as an example. Actually the content in byte[] will be another serialized message, but I wrongly used Unicode encoding without notice. Maybe there is a better method for testing it. Thank you all for the suggestion. – Marco2 May 31 '12 at 01:30