4

For Serialization of Primitive Array, i'am wondering how to convert a Primitive[] to his corresponding byte[]. (ie an int[128] to a byte[512], or a ushort[] to a byte[]...) The destination can be a Memory Stream, a network message, a file, anything. The goal is performance (Serialization & Deserialization time), to be able to write with some streams a byte[] in one shot instead of loop'ing' through all values, or allocate using some converter.

Some already solution explored:

Regular Loop to write/read

//array = any int[];
myStreamWriter.WriteInt32(array.Length);
for(int i = 0; i < array.Length; ++i)
   myStreamWriter.WriteInt32(array[i]);

This solution works for Serialization and Deserialization And is like 100 times faster than using Standard System.Runtime.Serialization combined with a BinaryFormater to Serialize/Deserialize a single int, or a couple of them.

But this solution becomes slower if array.Length contains more than 200/300 values (for Int32).

Cast?

Seems C# can't directly cast a Int[] to a byte[], or a bool[] to a byte[].

BitConverter.Getbytes()

This solution works, but it allocates a new byte[] at each call of the loop through my int[]. Performances are of course horrible

Marshal.Copy

Yup, this solution works too, but same problem as previous BitConverter one.

C++ hack

Because direct cast is not allowed in C#, i tryed some C++ hack after seeing into memory that array length is stored 4 bytes before array data starts

ARRAYCAST_API void Cast(int* input, unsigned char** output)
{
   // get the address of the input (this is a pointer to the data)
   int* count = input;
   // the size of the buffer is located just before the data (4 bytes before as this is an int)
   count--;
   // multiply the number of elements by 4 as an int is 4 bytes
   *count = *count * 4;
   // set the address of the byte array
   *output = (unsigned char*)(input);
}

and the C# that call:

byte[] arrayB = null;
int[] arrayI = new int[128];
for (int i = 0; i < 128; ++i)
   arrayI[i] = i;

// delegate call
fptr(arrayI, out arrayB);

I successfully retrieve my int[128] into C++, switch the array length, and affecting the right adress to my 'output' var, but C# is only retrieving a byte[1] as return. It seems that i can't hack a managed variable like that so easily.

So i really start to think that all theses casts i want to achieve are just impossible in C# (int[] -> byte[], bool[] -> byte[], double[] -> byte[]...) without Allocating/copying...

What am-i missing?

leppie
  • 115,091
  • 17
  • 196
  • 297
jlevet
  • 104
  • 9
  • Can you be more specific at what are you trying to do? You are serializing arrays? Serialize where? HDD can be your real bottleneck. And perhaps you should use `byte[]` to hold your original data (not need to serialize then, but retrieving data is tricky). – Sinatr Jun 22 '15 at 14:54
  • I'm surprised that the Regular Loop is so bad with performance. It strikes me that it should be better and merely looping of two or three hundred values alone doesn't sound like anything that should cause performance problems. Perhaps you should investigate this further rather than writing it off as unavoidable performance impact. – Chris Jun 22 '15 at 14:55
  • One note on the "C++ hack" I will note that you can do messing with pointers and stuff in c# if you want to. I've not done it myself but https://msdn.microsoft.com/en-us/library/f58wzh21(VS.80).aspx might be a starting point if you wanted to look at it. I've come across it in the context of fast manipulation of bitmap data in images but given that is also just manipulating arrays it might work for you if you really need performance. – Chris Jun 22 '15 at 14:58
  • @Sinatr i've edited my top message, but the goal is just to binary Serialize Arrays of primitives Values (can be byte, sbyte, short, ushort, int, sint, long, slong, double, decimal, DateTime, TimeSpan or bool) as quickly as i can. The destination doesn't matters, it can be a MemoryStream, a network messages, or for Files. – jlevet Jun 22 '15 at 15:15
  • @Chris i forgot to say but i've already tried fixed blocks unsuccessfully, i even tried direct IL code to cast, but i always have an InvalidCastException as result – jlevet Jun 22 '15 at 15:17
  • Sounds like you want an `UnmanagedMemoryMappedFile` with some `unsafe` ops which works just fine. – leppie Jun 22 '15 at 15:18
  • How about endianness? Are them guaranteed to be the same on the both ends? – user4003407 Jun 22 '15 at 15:18
  • @PetSerAl: Everything is pretty much the same endianess in modern archs. – leppie Jun 22 '15 at 15:19
  • @jlevet: Ah, ok. With respect to the invalid cast you can't directly convert an int32 array to a byte array that is 4 times as long. That just isn't possible. At best you might be able to convert it to a byte array of the same length which would cast each int32 to a byte but obviously you are losing data then. So basically don't look at casting. – Chris Jun 22 '15 at 15:44
  • With regards to your original loop I assume it is the writer that is the bottleneck there - I can't imagine its the simple for loop. Have you tried putting them into an intermediate form (eg a MemoryStream) and then writing that to your final destination as a single operation? – Chris Jun 22 '15 at 15:47
  • @PetSerAl for my case yes they are – jlevet Jun 22 '15 at 16:03
  • @Chris of course i can't convert an int[128] into a byte[128]. I'm talking here about a cast from an int[128] to a byte[512]. About the MemoryStream, I'm already serializing my array into a MemoryStream, and deserializing them from this MemoryStream. – jlevet Jun 22 '15 at 16:04

1 Answers1

4

How about using Buffer.BlockCopy?

// serialize
var intArray = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
var byteArray = new byte[intArray.Length * 4];
Buffer.BlockCopy(intArray, 0, byteArray, 0, byteArray.Length);

// deserialize and test
var intArray2 = new int[byteArray.Length / 4];
Buffer.BlockCopy(byteArray, 0, intArray2, 0, byteArray.Length);
Console.WriteLine(intArray.SequenceEqual(intArray2));    // true

Note that BlockCopy is still allocating/copying behind the scenes. I'm fairly sure that this is unavoidable in managed code, and BlockCopy is probably about as good as it gets for this.

LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Does `BlockCopy` work for primitive valuetype arrays? (I suspect so) – leppie Jun 22 '15 at 15:21
  • That can be a good way to go. i'll try this today, with some kind of static locked byte[] buffer to avoid multiple allocations and stay thread safe. Thanks for the idea – jlevet Jun 23 '15 at 06:48
  • I'm using this method except for decimal type. – jlevet Jun 25 '15 at 15:16