C# and Android/Java - cross-language binary stream writers/readers? (for primitives and UTF-8 strings)

Question

What is the easiest way to do binary serialization/deserialization of some custom data between C# and Android's Java? I'd like to find for Java something similar to C# BinaryWriter and BinaryReader - which supports writing primitives (like uint16) and UTF-8 strings.

Or maybe there is a better way?

edit: structure of the data is not know at compilation time

Sample write:

        BinaryWriter w = new BinaryWriter(File.OpenWrite(@"D:\data"));
        w.Write((UInt16)1234);
        w.Write("To jest żółwiątko");
        w.Write((UInt16)4567);

score 2 · Accepted Answer · answered Aug 30 '11 at 02:52

In Java all primitive types are signed (oddly even byte!). So you will need to write out signed integers if you want to read them in Java using DataInputStream.readInt(). Also note that readInt() uses big-endian. You can use something like the EndianBinaryReader from Jon Skeets MiscUtils to write these so the can be read on Android.

UTF-8 is a little trickier as DataInputStream uses something called MUTF-8 (Modified UTF-8) Encoding for strings. In code that we use to share data between android and .net we use a simple run-length encoded UTF-8 bytes to represent a String (-1 is null). Our reader method in Java looks something like this to read standard UTF-8 encoded strings from the C# BinaryWriter (after first writing out Int16 length):

public String readUTF8String() throws ImageFileFormatException, IOException
 {
     short len = readInt16();
     if (len == -1)
         return null;
     if (len == 0)
         return "";
     if (len < -1)
         throw new ImageFileFormatException("Invalid UTF8 string");
     byte[] utf8Bytes = readBytes(len);
     return new String(utf8Bytes, "UTF-8");
 }

The length of the UTF-8 string is not written the same for each. Java will write a two byte short for the length while C# writes a LEB-128 encoded length. — Brett Ryan, Dec 13 '13 at 21:17

score 0 · Answer 2 · answered Aug 29 '11 at 22:47

Do either of these libraries meet your needs?:

Protocol Buffers - "Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats."
Apache Thrift - "Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml."

I don't know neither structure nor length of the stream, it will depend on runtime conditions. Looks like both mentioned frameworks do not provide freedom of writing arbitrary primitives in random order... — tomash, Aug 29 '11 at 22:58

score 0 · Answer 3 · edited Oct 28 '12 at 14:44

0

Some days ago I was facing the same situation. Here is my solution, try this (C# code):

public static void WriteUTF(this BinaryWriter writer, string s)
{
    short length = (short)Encoding.UTF8.GetByteCount(s);
    writer.Write(BitConverter.GetBytes(length).Reverse().ToArray());
    writer.Write(s.ToCharArray());
}

edited Oct 28 '12 at 14:44

Baz

36,440
11
68
94

answered Mar 22 '12 at 09:01

Acrux

1
2

how about performance of `GetBytes(length).Reverse().ToArray()`? – tomash Mar 23 '12 at 11:18
I didn't investigate performance for this. But it was applicable for my tasks. – Acrux Mar 26 '12 at 12:40
BitConverter.GetBytes(length).Reverse().ToArray() is just converting the short into byte array and reversing it. I also don't know the cost but it shouldn't be scary – Fuad Malikov May 15 '13 at 07:19
Byte count will not be the same for UTF8 multi-byte characters. Since Java uses a modified UTF8 variant. – Brett Ryan Dec 13 '13 at 21:24

C# and Android/Java - cross-language binary stream writers/readers? (for primitives and UTF-8 strings)

3 Answers3