3

I have some unsafe C# code that does pointer arithmetic on large blocks of memory on type byte*, running on a 64-bit machine. It works correctly most of the time but when things get large I often get some kind of corruption where the pointer gets incorrect.

The strange thing is that if I turn on "Check for arithmetic overflow/underflow" everything works correctly. I do not get any overflow exceptions. But due to the large performance hit I need to run the code without this option.

What could be causing this difference in behavior?

Magnus Krisell
  • 639
  • 5
  • 8
  • 6
    can you show us some of your code? – aL3891 Jun 12 '11 at 11:24
  • Offset signed and the pointer unsigned? You rarely need a negative offset. –  Jun 12 '11 at 11:32
  • The relevant parts of the code simply looks like this: this.currentLocation += sizeof(byte), this.currentLocation += numberOfChildren * sizeof(ushort), this.currentLocation += subTree.Length etc. No negative offsets. – Magnus Krisell Jun 12 '11 at 11:35

4 Answers4

6

The difference between checked and unchecked here is actually a bit of a bug in the IL, or just some bad source code (I'm not a language expert so I will not comment on if the C# compiler is generating the correct IL for the ambigious source code). I compiled this test code using the 4.0.30319.1 version of the C# compiler (although the 2.0 verision seemed to do the same thing). The command line options I used were: /o+ /unsafe /debug:pdbonly.

For the unchecked block, we have this IL code:

//000008:     unchecked
//000009:     {
//000010:         Console.WriteLine("{0:x}", (long)(testPtr + offset));
  IL_000a:  ldstr      "{0:x}"
  IL_000f:  ldloc.0
  IL_0010:  ldloc.1
  IL_0011:  add
  IL_0012:  conv.u8
  IL_0013:  box        [mscorlib]System.Int64
  IL_0018:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                object)

At IL offset 11, the add gets 2 operands, one of type byte* and the other of type uint32. Per the CLI spec these are really normalized into native int and int32, respectively. According to the CLI spec (partition III to be precise), the result will be native int. Thus the secodn operand must be promoted to be of type native int. According to the spec, this is accomplished via a sign extension. So the uint.MaxValue (which is 0xFFFFFFFF or -1 in signed notation) is sign extened to 0xFFFFFFFFFFFFFFFF. Then the 2 operands are added (0x0000000008000000L + (-1L) = 0x0000000007FFFFFFL). The conv opcode is only needed for verification purposes to convert the native int into an int64, which in the generated code is a nop.

Now for the checked block, we have this IL:

//000012:     checked
//000013:     {
//000014:         Console.WriteLine("{0:x}", (long)(testPtr + offset));
  IL_001d:  ldstr      "{0:x}"
  IL_0022:  ldloc.0
  IL_0023:  ldloc.1
  IL_0024:  add.ovf.un
  IL_0025:  conv.ovf.i8.un
  IL_0026:  box        [mscorlib]System.Int64
  IL_002b:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                object)

It is virtually identical, except for the add and conv opcode. For the add opcode we've added 2 'suffixes'. The first one is the ".ovf" suffix which has an obvious meaning: check for overflow, but it is also required to 'enable the second suffix: ".un". (i.e. there is no "add.un", only "add.ovf.un"). The ".un" has 2 effects. The most obvious one is that the additiona nd overflow checking are done as if the operands were unsigned integers. From our CS classes way back when, hopefully we all remember that thanks to two's complement binary encoding, signed addition and unsigned addition are the same, so the ".un" really only impacts the overflow checking, right?

Wrong.

Remember that on the IL stack we don't have 2 64-bit numbers, we have an int32 and a native int (after normalization). Well the ".un" means that the conversion from int32 to native is treated like a "conv.u" rather than the default "conv.i" as above. Thus uint.MaxValue is zero extended to 0x00000000FFFFFFFFL. Then the add correctly produces 0x0000000107FFFFFFL. The conv opcode makes sure the unsigned operand can be represented as a signed int64 (which it can).

Your fix works just find for 64-bit. At the IL level a more correct fix would be to explicitly convert the uint32 operand to native int or unsigned native int, and then both the check and unchecked would bhave identically for both 32-bit and 64-bit.

  • Thanks! I guess I should've taken the time to study the IL when the problem occurred but at that point I just wanted the code to work. I consider this project to be "64-bit only" BTW, since it needs several GB of memory in realistic use. (I wish I could afford a server with 144 GB RAM.) – Magnus Krisell Jun 13 '11 at 21:07
3

Please double-check your unsafe code. Reading or writing memory outside the allocated block of memory causes that 'corruption'.

HandMadeOX
  • 745
  • 5
  • 6
  • Yeah, of course. I've been writing code like this for years (in C++ too); the thing that's new is working with 64-bit pointers and finding out that checking for overflow changes behavior, without causing any overflow exceptions... – Magnus Krisell Jun 12 '11 at 11:52
  • There is no difference in x86 and x64 from the point of this issue. Just one single byte accidentally read or written by invalid pointer - and the issue pops up in another place. Check rather another unsafe code, than the main algorithm where the error occurs. – HandMadeOX Jun 12 '11 at 14:51
  • I agree. When the algorithm works correctly the pointer will never be invalid. This is (was) just a symptom of problems with (64-bit related) pointer arithmetic. Anyway, I have solved the problem and written a short piece of of code that demonstrates it. Will post it as answer in a few hours. – Magnus Krisell Jun 12 '11 at 15:44
  • Sounds funny, like: long ptr = int.MaxValue; int offset = 10; unchecked { ptr = (long)((int)ptr + offset); } – HandMadeOX Jun 12 '11 at 16:31
3

It's a C# compiler bug (filed on Connect). @Grant has shown that the MSIL generated by the C# compiler interprets the uint operand as signed. That's wrong according to the C# spec, here's the relevant section (18.5.6):

18.5.6 Pointer arithmetic

In an unsafe context, the + and - operators (§7.8.4 and §7.8.5) can be applied to values of all pointer types except void*. Thus, for every pointer type T*, the following operators are implicitly defined:

T* operator +(T* x, int y);
T* operator +(T* x, uint y);
T* operator +(T* x, long y);
T* operator +(T* x, ulong y);
T* operator +(int x, T* y);
T* operator +(uint x, T* y);
T* operator +(long x, T* y);
T* operator +(ulong x, T* y);
T* operator –(T* x, int y);
T* operator –(T* x, uint y);
T* operator –(T* x, long y);
T* operator –(T* x, ulong y);
long operator –(T* x, T* y);

Given an expression P of a pointer type T* and an expression N of type int, uint, long, or ulong, the expressions P + N and N + P compute the pointer value of type T* that results from adding N * sizeof(T) to the address given by P. Likewise, the expression P - N computes the pointer value of type T* that results from subtracting N * sizeof(T) from the address given by P.

Given two expressions, P and Q, of a pointer type T*, the expression P – Q computes the difference between the addresses given by P and Q and then divides that difference by sizeof(T). The type of the result is always long. In effect, P - Q is computed as ((long)(P) - (long)(Q)) / sizeof(T).

If a pointer arithmetic operation overflows the domain of the pointer type, the result is truncated in an implementation-defined fashion, but no exceptions are produced.


You're allowed to add a uint to a pointer, no implicit conversion takes place. And the operation does not overflow the domain of the pointer type. So truncation is not allowed.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
1

I'm answering my own question as I have solved the problem, but would still be interested in reading comments about why the behavior changes with checked vs unchecked.

This code demonstrates the problem as well as the solution (always casting the offset to long before adding):

public static unsafe void Main(string[] args)
{
    // Dummy pointer, never dereferenced
    byte* testPtr = (byte*)0x00000008000000L;

    uint offset = uint.MaxValue;

    unchecked
    {
        Console.WriteLine("{0:x}", (long)(testPtr + offset));
    }

    checked
    {
        Console.WriteLine("{0:x}", (long)(testPtr + offset));
    }

    unchecked
    {
        Console.WriteLine("{0:x}", (long)(testPtr + (long)offset));
    }

    checked
    {
        Console.WriteLine("{0:x}", (long)(testPtr + (long)offset));
    }
}

This will return (when run on a 64-bit machine):

7ffffff
107ffffff
107ffffff
107ffffff

(BTW, in my project I first wrote all the code as managed code without all this unsafe pointer arithmetic nastiness but found out it was using too much memory. This is just a hobby project; the only one that gets hurt if it blows up is me.)

Magnus Krisell
  • 639
  • 5
  • 8
  • I'm not sure if this is related, but I recently found [a bug in the x64 JIT](https://connect.microsoft.com/VisualStudio/feedback/details/674232/jit-optimizer-error-when-loop-controlling-variable-approaches-int-maxvalue) that results in constants outside the range of `Int32` losing their value due to integer wraparound when checked arithmetic is disabled. I'll post a comment on that bug mentioning this test case. – Ben Voigt Jun 12 '11 at 19:41
  • 1
    Ok, see @Grant's answer. He's an MS engineer responsible for the JIT, so he definitely knows what's up. Apparently the MSIL produced by the C# compiler already has the "wrong" behavior, so it's not a JIT bug. I'll look at the C# spec and try to figure out whether the C# compiler is allowed to produce that MSIL from your code. – Ben Voigt Jun 13 '11 at 18:43
  • Thanks for your insightful comments! – Magnus Krisell Jun 13 '11 at 20:51