6

Sorry for the confusing title, but I can't think of a better way to explain it.

While browsing the source code of BitConverter recently, I came across a strange segment of code:

public static unsafe int ToInt32(byte[] value, int startIndex)
{
    fixed (byte* pbyte = &value[startIndex])
    {
        if (startIndex % 4 == 0) // data is aligned 
            return *((int*)pbyte);
        else
        { 
            if (IsLittleEndian)
            {  
                return (*pbyte) | (*(pbyte + 1) << 8)  | (*(pbyte + 2) << 16) | (*(pbyte + 3) << 24); 
            } 
            else
            { 
                return (*pbyte << 24) | (*(pbyte + 1) << 16)  | (*(pbyte + 2) << 8) | (*(pbyte + 3));                         
            } 
        }
    }
}

How can casting pbyte to an int* (line 6) violate data alignment in this scenario? I left it out for brevity, but the code has proper argument validation so I'm pretty sure it can't be a memory access violation. Does it lose precision when casted?

In other words, why can't the code be simplified to:

public static unsafe int ToInt32(byte[] value, int startIndex)
{
    fixed (byte* pbyte = &value[startIndex])
    {
        return *(int*)pbyte;
    }
}

Edit: Here is the section of code in question.

Joe Amenta
  • 4,662
  • 2
  • 29
  • 38
James Ko
  • 32,215
  • 30
  • 128
  • 239
  • 1
    Operations on data aligned on data size boundary are faster and on some CPUs access to non-aligned words/dwords/float/doubles is access viaolation... (comment as I don't have good link handy). โ€“ Alexei Levenkov Aug 26 '15 at 01:16
  • @AlexeiLevenkov Still, you create two additional branches, multiple bitwise operations, *and* dereference `pbyte + 1`, `pbyte + 2`, etc. (3 of which are not going to be "aligned") just so you can avoid that non-alignment. Seems pretty overkill to me. โ€“ James Ko Aug 26 '15 at 03:21
  • Not sure about your comment - there are CPUs that you flat out can't access unaligned `int` - http://stackoverflow.com/questions/1237963/alignment-along-4-byte-boundaries - so what alternative implementation do you suggest compared to reading byte-by-byte? โ€“ Alexei Levenkov Aug 26 '15 at 03:40

1 Answers1

1

I'd bet that this has to do with this part of ยง18.4 in version 5.0 of the C# specification (emphasis mine):

When one pointer type is converted to another, if the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined if the result is dereferenced.

Bytewise copying in the "unaligned" case is done to avoid relying on explicitly undefined behavior.

Joe Amenta
  • 4,662
  • 2
  • 29
  • 38