17

Consider the following XE6 code. The intention is that ThingData should be written to the console for both Thing1 & Thing2, but it is not. Why is that?

program BytesFiddle;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

type
  TThing = class
  private
    FBuf : TBytes;
    FData : TBytes;
    function GetThingData: TBytes;
    function GetThingType: Byte;
  public
    property ThingType : Byte read GetThingType;
    property ThingData : TBytes read GetThingData;

    constructor CreateThing(const AThingType : Byte; const AThingData: TBytes);
  end;

{ TThing1 }

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + 1);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));

  FData := @FBuf[1];
  SetLength(FData, Length(FBuf) - 1);
end;

function TThing.GetThingData: TBytes;
begin
  Result := FData;
end;

function TThing.GetThingType: Byte;
begin
  Result := FBuf[0];
end;

var
  Thing1, Thing2 : TThing;

begin
  try
    Thing1 := TThing.CreateThing(0, TEncoding.UTF8.GetBytes('Sneetch'));
    Thing2 := TThing.CreateThing(1, TEncoding.UTF8.GetBytes('Star Belly Sneetch'));

    Writeln(TEncoding.UTF8.GetString(Thing2.ThingData));
    Writeln(Format('Type %d', [Thing2.ThingType]));

    Writeln(TEncoding.UTF8.GetString(Thing1.ThingData));
    Writeln(Format('Type %d', [Thing1.ThingType]));

    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.
Johan
  • 74,508
  • 24
  • 191
  • 319
Hugh Jones
  • 2,706
  • 19
  • 30

4 Answers4

33

Let me walk you through the ways in which this code fails and how the compiler allows you to shoot yourself in the foot.

If you step through the code using the debugger you can see what happens.

enter image description here

After the initialization of Thing1 you can see that FData is filled with all zeros.
Strangely enough Thing2 is fine.
Therefore the error is in CreateThing. Let's investigate further...

In the oddly named constructor CreateThing you have the following line:

FData := @FBuf[1];

This looks like a simple assignment, but is really a call to DynArrayAssign

Project97.dpr.32: FData := @FBuf[1];
0042373A 8B45FC           mov eax,[ebp-$04]
0042373D 83C008           add eax,$08
00423743 8B5204           mov edx,[edx+$04]
00423746 42               inc edx
00423747 8B0DE03C4000     mov ecx,[$00403ce0]
0042374D E8E66DFEFF       call @DynArrayAsg      <<-- lots of stuff happening here.  

One of the checks DynArrayAsg performs is to check whether the source dynamic array is empty or not.
DynArrayAsg also does a few other things which you need to be aware about.

Let's first have a look at the structure of a dynamic array; it's not just a simple pointer to an array!

Offset 32/64  |   Contents     
--------------+--------------------------------------------------------------
-8/-12        | 32 bit reference count
-4/-8         | 32 or 64 bit length indicator 
 0/ 0         | data of the array.

Performing FData = @FBuf[1] you are messing up with the prefix fields of the dynamic array.
The 4 bytes in front of @Fbuf[1] are interpreted as the length.
For Thing1 these are:

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 00 00 00  08 00 00 00  00  'S' 'n' .....
FData:    00 00 00 08  00 00 00 00  .............. //Hey that's a zero length.

Oops, when DynArrayAsg starts investigating it sees that what it thinks is the source for the assign has a length of zero, i.e. it thinks the source is empty and does not assign anything. It leaves FData unchanged!

Does Thing2 work as intended?
It looks like it does, but it actually fails in rather a bad way, let me show you.

enter image description here

You've successfully tricked the runtime into believing @FBuf[1] is a valid reference to a dynamic array.
Because of this the FData pointer has been updated to point to FBuf[1] (so far so good), and the reference count of FData has been increased by 1 (not good), also the runtime has grown the memory block holding the dynamic array to what it thinks is the correct size for FData (bad).

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 01 00 00  13 00 00 00  01  'S' 'n' .....
FData:    01 00 00 13  00 00 00 01  'S' .............. 

Oops FData now has a refcount of 318,767,105 and a length of 16,777,216 bytes.
FBuf also has its length increased, but its refcount is now 257.

This is why you need the call to SetLength to undo the massive overallocation of memory. This still does not fix the reference counts though.
The overallocation may cause out of memory errors (esp. on 64-bit) and the wacky refcounts cause a memory leak because your arrays will never get freed.

The solution
As per David's answer: enable typed checked pointers: {$TYPEDADDRESS ON}

You can fix the code by defining FData as a normal PAnsiChar or PByte.
If you make sure to always terminate your assignments to FBuf with a double zero FData will work as expected.

Make FData a TBuffer like so:

TBuffer = record
private
  FData : PByte;
  function GetLength: cardinal;
  function GetType: byte;
public
  class operator implicit(const A: TBytes): TBuffer;
  class operator implicit(const A: TBuffer): PByte;
  property Length: cardinal read GetLength;
  property DataType: byte read GetType;
end;

Rewrite CreateThing like so:

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + Sizeof(AThingType) + 2);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));
  FBuf[Lengh(FBuf)-1]:= 0;
  FBuf[Lengh(FBuf)-2]:= 0;  //trailing zeros for compatibility with pansichar

  FData := FBuf;  //will call the implicit class operator.
end;

class operator TBuffer.implicit(const A: TBytes): TBuffer;
begin
  Result.FData:= PByte(@A[1]);
end;

I don't understand all this mucking about trying to outsmart the compiler.
Why not just declare FData like so:

type
  TMyData = record
    DataType: byte;
    Buffer: Ansistring;  
    ....

And work with that.

Laurel
  • 5,965
  • 14
  • 31
  • 57
Johan
  • 74,508
  • 24
  • 191
  • 319
  • It will never work, he tries to make FData as a reference to the second element in FBuf. – whosrdaddy Aug 31 '16 at 12:36
  • @whosrdaddy - Yes - I am aware the issue is with the line `FData := @FBuf[1];` but I do not fully understand why. Further - adding Thing3, Thing4, and so on, works as "intended" – Hugh Jones Aug 31 '16 at 12:48
  • 2
    @HughJones, no it doesn't and it won't, hold on still working on it. – Johan Aug 31 '16 at 12:53
  • 1
    @Hugh it will *seem* to work for each `TThing.CreateThing` where the `AThingType` parameter is not `0` like Johan explains in his answer – fantaghirocco Aug 31 '16 at 13:00
  • @fantaghirocco - Got it. – Hugh Jones Aug 31 '16 at 13:04
  • No need for reverse engineering surely. – David Heffernan Aug 31 '16 at 13:14
  • @Hugh maybe you may like to consider the adoption of a record like this: `TThingInfo = record thingType: Byte; thingData: TBytes; end;` instead of an array – fantaghirocco Aug 31 '16 at 13:21
  • @DavidHeffernan, occupational hazard, still I thought it would be nice to know **why** it fails so spectacularly and what the nasty and hidden side effects of this sort of `trick the compiler` code are. – Johan Aug 31 '16 at 13:22
  • @david I am not sure which answer to accept - one is informative, the other is concise; I could argue it both ways. – Hugh Jones Aug 31 '16 at 13:31
  • @johan - I will probably go with a `PByte` rather than a `PAnsiChar`; My stream of bytes can contain all sorts. I need a new title for this question because the one I came up with was because I was guessing the issue was in the region of reference counting ... Suggestions anyone? I bet it's a duplicate – Hugh Jones Aug 31 '16 at 13:34
  • @fantaghirocco - I had that in a previous version of my project but abandoned it because it was misbehaving - now I know why I may go that way. – Hugh Jones Aug 31 '16 at 13:37
  • Use a wrapper view type rather than `PByte`. Then you can keep the length with that property and make the client code more useful. Accept whichever you like. I think my answer is informative as well as concise FWIW! Johan's answer is very good. Do accept it with my blessing! – David Heffernan Aug 31 '16 at 13:37
18

The problem can be seen readily by enabling type-checked pointers. Add this to the top of your code:

{$TYPEDADDRESS ON}

The documentation says:

The $T directive controls the types of pointer values generated by the @ operator and the compatibility of pointer types.

In the {$T-} state, the result of the @ operator is always an untyped pointer (Pointer) that is compatible with all other pointer types. When @ is applied to a variable reference in the {$T+} state, the result is a typed pointer that is compatible only with Pointer and with other pointers to the type of the variable.

In the {$T-} state, distinct pointer types other than Pointer are incompatible (even if they are pointers to the same type). In the {$T+} state, pointers to the same type are compatible.

With that change your program fails to compile. This line fails:

FData := @FBuf[1];

The error message is:

E2010 Incompatible types: 'System.TArray<System.Byte>' and 'Pointer'

Now, FData is of type TArray<Byte> but @FBuf[1] is not a dynamic array but rather a pointer to a byte in the middle of a dynamic array. The two are not compatible. By operating in the default mode where pointers are not type-checked, the compiler lets you commit this terrible mistake. Quite why this is the default mode is utterly beyond me.

A dynamic array is more than a pointer to the first element – there is also metadata such as length and reference count. That metadata is stored at an offset from the first element. Hence your entire design is flawed. Store the type code in a separate variable, and not as part of the dynamic array.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • (as you may have guessed) my question arises from a larger project in which I have to 'chop up' a much larger `TArray`. I must rethink my design, as you say. – Hugh Jones Aug 31 '16 at 13:10
  • 2
    Easy enough to create a class which looks like an array but is in fact mapping onto a region of an actual array. A class (or record) that presents a **view**. Anyway, let type checked pointers be your friend in the future! – David Heffernan Aug 31 '16 at 13:15
6

Dynamic arrays are pointers internally, and assignment-compatible with pointers; but the only correct pointers on the right side of assignment are nil or another dynamic array. FData := @FBuf[1]; is obviously wrong, but interesting that FData := @FBuf[0]; is probably OK, even if $TYPEDADDRESS is enabled.

The following code compiles and works as expected in Delphi XE:

program Project19;

{$APPTYPE CONSOLE}
{$TYPEDADDRESS ON}

uses
  SysUtils;

procedure Test;
var
  A, B: TBytes;

begin
  A:= TBytes.Create(11,22,33);
  B:= @A[0];
  Writeln(B[1]);
end;

begin
  try
    Test;
    readln;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

Seems like the compiler "knows" that @A[0] is a dynamic array, not just a pointer.

kludg
  • 27,213
  • 5
  • 67
  • 118
-1
constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
var
  Buffer : array of Byte;
begin
  SetLength(Buffer, Length(AThingData) + Sizeof(AThingType));
  Buffer[0] := AThingType;
  Move(AThingData[0], Buffer[1], Length(AThingData));

  SetLength(FBuf, Length(Buffer));
  Move(Buffer[0], FBuf[0], Length(Buffer));
  SetLength(FData, Length(AThingData));
  Move(Buffer[1], FData[0], Length(AThingData));
end;
shyambabu
  • 169
  • 11
  • That works, except that my main design consideration was to minimise memory copying - hence how I came to be trying to re-reference portions of a `TBytes`; `ThingData` could potentially be huge in my real world example. – Hugh Jones Sep 01 '16 at 12:09
  • The question is a "why" question. Asker seeks to understand why the code behaves as it does. Such questions demand words and explanation. A code of l'y answer seldom fits the bill. This one utterly fails to answer the question. – David Heffernan Sep 02 '16 at 02:08