4

I'm writing an optimized binary reader/writer for learning purposes by myself. Everything works fine, until I wrote the tests for the en- and decoding of decimals. My tests also include if the BinaryWriter of the .NET Framework produces compatible output to my BinaryWriter and vice versa.

I'm mostly using unsafe and pointers to write my variables into byte-arrays. Those are the results, when writing a decimal via pointers and via the BinaryWriter:

BinaryWriter....: E9 A8 94 23 9B CA 4E 44 63 C5 44 39 00 00 1A 00
unsafe *decimal=: 00 00 1A 00 63 C5 44 39 E9 A8 94 23 9B CA 4E 44

My code writing a decimal looks like this:

unsafe
{
    byte[] data = new byte[16];

    fixed (byte* pData = data)
        *(decimal*)pData = 177.237846528973465289734658334m;
}

And using BinaryWriter of .NET Framework looks like this:

using (MemoryStream ms = new MemoryStream())
{
    using (BinaryWriter writer = new BinaryWriter(ms))
        writer.Write(177.237846528973465289734658334m);

    ms.ToArray();
}

Microsoft made their BinaryWriter incompatible to the way decimals are stored in memory. By looking into the referencesource we see that Microsoft uses an internal method called GetBytes, which means that the output of GetBytes is incompatible to the way decimals are stored in memory.

Is there a reason why Microsoft implemented writing decimals in this way? May it be dangerous to use the way with unsafe to implement own binary formats or protocols because the internal layout of decimals may change in the future?

Using the unsafe way performs quite better than using GetBytes called by the BinaryWriter.

Scharle
  • 140
  • 1
  • 7
  • Possible duplicate of [Are the raw bytes written by .NET System.IO.BinaryWriter readable by other platforms?](https://stackoverflow.com/questions/33797958/are-the-raw-bytes-written-by-net-system-io-binarywriter-readable-by-other-platf) – GSerg Jan 02 '19 at 14:51
  • 1
    @GSerg To be fair that referenced answer only says it's a .NET specific format. It doesn't answer the questions of OP really. – Neijwiert Jan 02 '19 at 14:55
  • @Neijwiert Well, no one can answer the OP's question of whether Microsoft will ever feel like changing this format. We can only speculate that it would be highly unlikely for compatibility reasons. – GSerg Jan 02 '19 at 14:56
  • 1
    @GSerg True, but I was kind of hoping for somebody to explain how the current implementation is done, as I cannot. – Neijwiert Jan 02 '19 at 14:58
  • @Neijwiert See the [second answer](https://stackoverflow.com/a/33798418/11683). `decimal --> decimal.GetBytes(), 16 bytes, should see the System.Decimal class code`. It's a typo though, should be `GetBits()`. – GSerg Jan 02 '19 at 14:58
  • It's not a typo. `GetBits()` and `GetBytes()` are both existing. – Matthias Jan 02 '19 at 18:34
  • "the unsafe way performs quite better" - did you include thge I/O in that measurement? – H H Jan 02 '19 at 18:37
  • It's media agnostic. Therefore I measured only how fast it can write to and read from RAM. – Scharle Jan 02 '19 at 18:52
  • The performance of a serializer is all but media agnostic. – bommelding Jan 03 '19 at 08:06
  • The performance of a serializer is always media agnostic - especially when you compare different serializers resulting in the same amount of data. There is no reason to waste cpu cycles even when the serializer doesn't utilize much CPU% because the network or disk isn't that fast. I want to see the performance of the serializer and not the performance of the disk or the nic. – Scharle Jan 03 '19 at 09:15

1 Answers1

2

Microsoft itself tried to keep the decimal and the alignment of it's components as steady as possible. You can also see this in the mentioned referencesource of the .NET framework:

// NOTE: Do not change the order in which these fields are declared. The
// native methods in this class rely on this particular order.
private int flags;
private int hi;
private int lo;
private int mid;

Together with the usage of [StructLayout(LayoutKind.Sequential)] the structure gets aligned in exactly that way in the memory.

You get wrong results because of the GetBytes method using the variables which are building the data of the decimal internally not in the order they are aligned in the structure itself:

internal static void GetBytes(Decimal d, byte[] buffer)
{
    Contract.Requires((buffer != null && buffer.Length >= 16), "[GetBytes]buffer != null && buffer.Length >= 16");
    buffer[0] = (byte)d.lo;
    buffer[1] = (byte)(d.lo >> 8);
    buffer[2] = (byte)(d.lo >> 16);
    buffer[3] = (byte)(d.lo >> 24);

    buffer[4] = (byte)d.mid;
    buffer[5] = (byte)(d.mid >> 8);
    buffer[6] = (byte)(d.mid >> 16);
    buffer[7] = (byte)(d.mid >> 24);

    buffer[8] = (byte)d.hi;
    buffer[9] = (byte)(d.hi >> 8);
    buffer[10] = (byte)(d.hi >> 16);
    buffer[11] = (byte)(d.hi >> 24);

    buffer[12] = (byte)d.flags;
    buffer[13] = (byte)(d.flags >> 8);
    buffer[14] = (byte)(d.flags >> 16);
    buffer[15] = (byte)(d.flags >> 24);
}

It seems to me that the corresponding .NET developer tried to adapt the format presented by GetBytes to little endian, but made one mistake. He didn't only order the bytes of the components of the decimal but also the components itself. (flags, hi, lo, mid becomes lo, mid, hi, flags.) But little endian layout is only adapted to fields not to whole structs - especially with [StructLayout(LayoutKind.Sequential)].

My advice here is usually to use the methods Microsoft offers in their classes. So I would prefer any GetBytes or GetBits based way to serialize the data than doing it with unsafe because Microsoft will keep the compatibility to the BinaryWriter in any way. However, the comments are kinda serious and I wouldn't expect microsoft to break the .NET framework on this very basic level.

It's hard for me to believe that performance matters that strongly to favour the unsafe way over GetBits. After all we are talking about decimals here. You still can push the int of GetBits via unsafe into your byte[].

Matthias
  • 948
  • 1
  • 6
  • 25
  • Internal structure of `decimal` is documented in the [`Decimal(Int32[])` constructor](https://learn.microsoft.com/en-us/dotnet/api/system.decimal.-ctor?view=netframework-4.7.2#System_Decimal__ctor_System_Int32___). It also explains why `decimal.GetBits()` returns an array of four `int`s, in a specific order, and what they mean. I do not see where there would be a bug with incorrect byte order. – GSerg Jan 02 '19 at 22:17
  • @GSerg As already mentioned in other comments: `GetBits()` is irrelevant for this because [`BinaryWriter` uses `GetBytes()` internally](https://referencesource.microsoft.com/#mscorlib/system/io/binarywriter.cs,251). Furthermore the linked documentation doesn't tell _why_ the memory layout of the `struct`ure differs from the layout returned by `GetBytes()` whereas my answer _does_. – Matthias Jan 02 '19 at 23:37
  • You have your reasoning backwards. `GetBits()` is the starting point because it is *documented*. What it returns cannot change. From this starting point we can see that `internal GetBytes()` returns the same data as the *documented* `GetBits()`, but always in the little endian format (whereas `GetBits()` will naturally use the system's current endianness) - which makes sense for portability between systems with different endianness. These two methods will not change (that would break ability to load `decimal`s persisted to storage before the hypothetical change). – GSerg Jan 03 '19 at 08:26
  • The internal order of private fields comprising the `decimal` structure, on contrary, may easily change at any moment because that would be purely internal to the framework - but in reality it is not going to happen because there is no reason to make such change (e.g. introducing new fields would break `GetBits` and `GetBytes`; if that had to be done, they would come up with a new type, e.g. `decimal2`). But even with that being the case, the strict answer is No, there is no guarantee that what you see by casting `decimal*` to `byte*` will not change, and you should not rely on it. – GSerg Jan 03 '19 at 08:26