Strange behavior of move with strings

Question

I am testing some enhanced string related functions with which I am trying to use move as a way to copy strings around for faster, more efficient use without delving into pointers.

While testing a function for making a delimited string from a TStringList, I encountered a strange issue. The compiler referenced the bytes contained through the index when it was empty and when a string was added to it through move, index referenced the characters contained.

Here is a small downsized barebone code sample:-

unit UI;

interface

uses
  System.SysUtils, System.Types, System.UITypes, System.Rtti, System.Classes,
  System.Variants, FMX.Types, FMX.Controls, FMX.Forms, FMX.Dialogs, FMX.Layouts,
  FMX.Memo;

type
  TForm1 = class(TForm)
    Results: TMemo;
    procedure FormCreate(Sender: TObject);
  end;

var
  Form1: TForm1;

implementation

{$R *.fmx}

function  StringListToDelimitedString
          ( const AStringList: TStringList; const ADelimiter: String ): String;
var
  Str           : String;
  Temp1         : NativeInt;
  Temp2         : NativeInt;
  DelimiterSize : Byte;

begin

  Result        := ' ';
  Temp1         := 0;
  DelimiterSize := Length ( ADelimiter ) * 2;

  for Str in AStringList do
    Temp1 := Temp1 + Length ( Str );

  SetLength ( Result, Temp1 );
  Temp1     := 1;

  for Str in AStringList do
  begin

    Temp2 := Length ( Str ) * 2;

    // Here Index references bytes in Result
    Move  ( Str [1],        Result [Temp1], Temp2 );

    // From here the index seems to address characters instead of bytes in Result
    Temp1 := Temp1 + Temp2;
    Move  ( ADelimiter [1], Result [Temp1], DelimiterSize );    
    Temp1 := Temp1 + DelimiterSize;

  end;

end;

procedure TForm1.FormCreate(Sender: TObject);
var
  StrList : TStringList;
  Str     : String;

begin

  // Test 1 : StringListToDelimitedString

  StrList := TStringList.Create;
  Str     := '';

  StrList.Add ( 'Hello1' );
  StrList.Add ( 'Hello2' );
  StrList.Add ( 'Hello3' );
  StrList.Add ( 'Hello4' );

  Str := StringListToDelimitedString ( StrList, ';' );
  Results.Lines.Add ( Str ); 
  StrList.Free;

end;

end.

Please devise a solution and if possible, some explanation. Alternatives are welcome too.

Solution seems to be `Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4']).Join(';');` — Arioch 'The, Nov 06 '13 at 09:25
Not really the answer to the question, but Delphi always had pretty good string performance due to it's use of reference counting and copy-on-write. Are you sure your methods will provide any benefit over this? — Matt Allwood, Nov 06 '13 at 09:26
Its a simpler version of the original method, which has other features suited to my usual works. — Umair Ahmed, Nov 06 '13 at 09:30
BTW, why do you add delimiter AFTER the LAST string ? aren't delimiters supposed to be only BETWEEN the strings ? — Arioch 'The, Nov 06 '13 at 10:28
I am writing a better version, now inside my own TStringlist class which is almost the same as the native one with less additional method calls, some methods are now inlined, verbose exception descriptions and other optimizations I learned on SO. I will also explore the record type enumerator David mentioned for it. — Umair Ahmed, Nov 06 '13 at 10:43
@UmairAhmed Are you timing anything? Without timing, there's often little point in optimising. — David Heffernan, Nov 06 '13 at 11:21
Well timing is something I haven't gotten around quiet well, using stopwatch and all. I have now resorted to resolving such through the asm generated. I don't have much experience with x86 assembly, rather I have worked quiet a lot with microcontroller assembly. — Umair Ahmed, Nov 06 '13 at 12:44

David Heffernan · Accepted Answer · 2013-11-06T10:20:30.520

Let's look at the crucial bit of code:

// Here Index references bytes in Result
Move  ( Str [1],        Result [Temp1], Temp2 );

// From here the index seems to address characters instead of bytes in Result
Temp1 := Temp1 + Temp2;
Move  ( ADelimiter [1], Result [Temp1], DelimiterSize );

Now, some explanations. When you index a string, you are always indexing characters. You are never indexing bytes. It looks to me as though you wish to index bytes. In which case using the string index operator makes life hard. So I suggest that you index bytes as follows.

First of all initialise Temp1 to 0 rather than 1 since we will be using zero-based indexing.

When you need to index Result using a zero-based byte index, do so like this:

PByte(Result)[Temp1]

So your code becomes:

Temp1 := 0;
for Str in AStringList do
begin
  Temp2 := Length(Str)*2;
  Move(Str[1], PByte(Result)[Temp1], Temp2);
  Temp1 := Temp1 + Temp2;
  Move(ADelimiter[1], PByte(Result)[Temp1], DelimiterSize);    
  Temp1 := Temp1 + DelimiterSize;
end;

In fact I think I'd write it like this, avoiding all string indexing:

Temp1 := 0;
for Str in AStringList do
begin
  Temp2 := Length(Str)*2;
  Move(Pointer(Str)^, PByte(Result)[Temp1], Temp2);
  Temp1 := Temp1 + Temp2;
  Move(Pointer(ADelimiter)^, PByte(Result)[Temp1], DelimiterSize);    
  Temp1 := Temp1 + DelimiterSize;
end;

I'd suggest better names than Temp1 and Temp2. I also question the use of NativeInt here. I'd normally expect to see Integer. Not least because a Delphi string is indexed by signed 32 bit values. You cannot have a string with length greater than 2GB.

Note also that you are not allocating enough memory. You forgot to account for the length of the delimiter. Fix that and your function looks like this:

function StringListToDelimitedString(const AStringList: TStringList;
  const ADelimiter: String): String;
var
  Str: String;
  Temp1: Integer;
  Temp2: Integer;
  DelimiterSize: Integer;
begin
  Temp1 := 0;
  DelimiterSize := Length(ADelimiter) * SizeOf(Char);

  for Str in AStringList do
    inc(Temp1, Length(Str) + DelimiterSize);

  SetLength(Result, Temp1);
  Temp1 := 0;
  for Str in AStringList do
  begin
    Temp2 := Length(Str) * SizeOf(Char);
    Move(Pointer(Str)^, PByte(Result)[Temp1], Temp2);
    inc(Temp1, Temp2);
    Move(Pointer(ADelimiter)^, PByte(Result)[Temp1], DelimiterSize);
    inc(Temp1, DelimiterSize);
  end;
end;

If you want to avoid pointers, then write it like this:

function StringListToDelimitedString(const AStringList: TStringList;
  const ADelimiter: String): String;
var
  Str: String;
  StrLen: Integer;
  ResultLen: Integer;
  DelimiterLen: Integer;
  ResultIndex: Integer;
begin
  DelimiterLen := Length(ADelimiter);

  ResultLen := 0;
  for Str in AStringList do
    inc(ResultLen, Length(Str) + DelimiterLen);

  SetLength(Result, ResultLen);

  ResultIndex := 1;
  for Str in AStringList do
  begin
    StrLen := Length(Str);
    Move(Pointer(Str)^, Result[ResultIndex], StrLen*SizeOf(Char));
    inc(ResultIndex, StrLen);
    Move(Pointer(ADelimiter)^, Result[ResultIndex], DelimiterLen*SizeOf(Char));
    inc(ResultIndex, DelimiterLen);
  end;
end;

So I have to resort to pointers after all... the names were the consequence of haphazard simplification of the original method... and you didn't explain as to why does it first reference bytes as move works for normal string copying from array of bytes in my other libraries? — Umair Ahmed, Nov 06 '13 at 09:28
You don't have to resort to pointers. You can use string indexing. But then you'd need to switch to character based indexing rather than byte based. Can you elaborate a little more on what you say that I did not explain. I don't follow. — David Heffernan, Nov 06 '13 at 09:29
@Arioch'The I know, as I said its just a simpler version of the original. — Umair Ahmed, Nov 06 '13 at 09:33
@DavidHeffernan Ah I get it, my bad, thanks for the answers though. — Umair Ahmed, Nov 06 '13 at 09:34
@DavidHeffernan I know its off-topic but which would be better, manipulating bytes or chars? and should I use simple for instead of for-in to forgo the enumerator overhead? — Umair Ahmed, Nov 06 '13 at 09:39
I guess you should do some timings to answer these questions. The enumerator is an overhead for sure. The supplied enumerators are heap allocated which I think sucks. When I write them I use records, value types, that can be allocated on the stack. Of course, it may not matter. Only timing would let you know if it matters. And timing is quite hard to do well. Is this function even a bottleneck in your program? As for byte vs char, you may have a preference, but performance may influence that. Again, measure. — David Heffernan, Nov 06 '13 at 09:46

Arioch 'The · Answer 2 · 2013-11-06T11:04:05.013

System.Move works with untyped pointers and counter of bytes. System.Copy and SysUtils.StrLCopy work with strings (Pascal strings and C strings respectively) and counter of chars. But char and byte are different types, so when you move from string/char context to pointers/bytes context - you should re-calculate length in chars to length in bytes. By the way, same is about indices, Result [Temp1] calculates in characters, not in bytes. And always did.

Correct solution is not mixing citizens of different planets. If you want pointers - use pointers. If you want characters and strings - use characters and strings. But do not mix them! divide and conquer and always separate and make clear when you're using raw piinters and where you use typed strings! Otherwise you're misleading yourself;

function  StringListToDelimitedString
          ( const AStringList: TStrings; const ADelimiter: String ): String;
var
  Str           : array of String;
  Lengths       : array of Integer;
  Temp1         : NativeInt;
  Count, TotalChars : Integer;

  PtrDestination: PByte;
  PCurStr: ^String;
  CurLen: Integer;

  Procedure  Add1(const Source: string);
  var count: integer; // all context is in bytes, not chars here!
      Ptr1, Ptr2: PByte; 
  begin
    if Source = '' then exit;
    Ptr1 := @Source[ 1 ];
    Ptr2 := @Source[ Length(Source)+1 ];
    count := ptr2 - ptr1;

    Move( Source[1], PtrDestination^, count);
    Inc(PtrDestination, count);
  end;

begin // here all context is in chars and typed strings, not bytes
  Count := AStringList.Count;
  if Count <= 0 then exit('');

  SetLength(Str, Count); SetLength(Lengths, Count);
  TotalChars := 0; 
  for Temp1 := 0 to Count - 1 do begin
      PCurStr  := @Str[ Temp1 ]; 
      PCurStr^ := AStringList[ Temp1 ]; // caching content, avoiding extra .Get(I) calls
      CurLen := Length ( PCurStr^ ); // caching length, avoind extra function calls
      Lengths[ Temp1 ] := CurLen;
      Inc(TotalChars,  CurLen);
  end;

  SetLength ( Result, TotalChars + ( Count-1 )*Length( ADelimiter ) );

  PtrDestination := Pointer(Result[1]); 
  // Calls UniqueString to get a safe pointer - but only once 

  for Temp1 := Low(Str) to High(Str) do
  begin
    Add1( Str[ Temp1 ] );
    Dec( Count );
    if Count > 0 // not last string yet
       then Add1( Delimeter );
  end;
end;

Now, the correct solution i believe would be stopping inventing bicycles and using ready-made and tested libraryes, for example.

 Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4']).Join(';');

Or, if you really need to add a delimiter PAST THE LAST string (which is usually carefully avoided) then

 Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4', '']).Join(';');

The original claim of squeezing single percents of CPU power just does not hold with the original code. The illusion of fast pointer operations is just shadowed by a suboptimal code caring not about performance at all.

function  StringListToDelimitedString
          ( const AStringList: TStringList; const ADelimiter: String ): String;

TStringList is a class. Class instance creation and deletion are expensive (slow) operations. Delphi made a flexible framework of those classes - but the speed suffers. So if you want to get few extra percents of speed and pay with sacrificing reliability and flexibility - the don't use classes.

DelimiterSize : Byte;

It should be NativeInt instead just as the rest of muneric variables there. You think you just saved few bytes - but you forced CPU to use non-native datatype and insert typecasts every now and then. It is nothing but an explicitly introduced delay. Ironically, you even did not saved those bytes, cause Delphi would just pad three bytes more to allocate next variable on 32-bits boundary. That is a typical "memory alignment" optimization.

Result        := ' ';

This value would never be used. So it is just a loss of time.

for Str in AStringList do

This construction, requiring instantiating TInterfacedObject and calling its virtual methods and then reference-counting it with global locking - is expensive (slow) operation. And twice slow in multithreading taskloads. If you need to squeeze few percetns of speed - you should avoid loosing tens of percents on for-in loops. Those hi-level loops are handy and reliable and flexible - but they pay with speed for that.

 for Str in AStringList do

Then You do it twice. But you DON'T KNOW how it that stringlist implemented. How efficiently does it gets the string ? It may even pass messages to another process like TMemo.Lines do! So you should minimize all accesses to that class and its multitude of internal virtual members. Cache all the strings ONCE in some local variable, do not fetch TWICE every of those!

 Move  ( Str [1],        Result [Temp1], Temp2 );

Now we come to a really interesting question - is there even hypothetical place to gain any speed advantage by usiong pointers and bytes? Open CPU window and look how that line is actually implemented!

Strings are reference-counted! When you do Str2 := Str1; no data is copied but only pointers do. But as you start accessing the real memory buffer inside the string - that Str[1] expression - the compiler can not more count references, so Delphi is forced here to decrease reference coutner to ONE SINGLE. That is, Delphi is forced here to call UniqueString over Str and over Result; the System.UniqueString checks the refcounter and if it is >1 makes a special local copy of string (copying all the data into a newly allocated special buffer). Then you do a Move - just like Delphi RTL does itself. I cannto get where any advantage of speed may come from?

 Move  ( ADelimiter [1], Result [Temp1], DelimiterSize )

And here the same operations are done yet again. And they are costly operations! At least an extra procedure is called, at worst the new buffer is allocated and all the content being copied.

Resume:

The boundary between reference-counted strings and raw pointers is a costly one and every time you cross it - you force Delphi to pay a price.
Mixing those boundaries in the same code make the price paid again and again and again. It also confuses yourself where your counters and indices refer to bytes and where they refer to chars.
Delphi optimized casual string operations for years. And did a pretty good job there. Outperforming Delphi is possible - but you would need to understand in very very fine details - up to each CPU assembler instruction - what goes under the curtains of Pascal sources in your program. That is a dirty and tedious work. There will be no mopre that luxury of using those reliable and flexible things as for-in loop and TStrings classes.
In the end you would most probably get a few percents of speed gain, that no one would ever notice. But you will pay for that with a code much harder to understand, write, read and test. Will those few percents of speed worth unmaintainable code ? I doubt it.

So unless you are forced to do so, my advice is to skip wasting your time and just do a usual Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4']).Join(';'); Reliability and flexibility is almost always more preferable than sheer speed.

And sorry to tell that, while i do not know a lot about speed optimizations, i easily saw a speed-damaging issues in your code, that you intended to be faster than Delphi itself. My experience is miles and miles away of even trying to outperform Delphi in strings field. And i do not think you have any chances other but wasting a lot of time to finally get performance worse than stock one.

Your answer is great too, if only there was a multiple good answer option... Thanks for the great suggestions which I never heard of nor thought of, like the nativeint one. — Umair Ahmed, Nov 06 '13 at 10:02
I always prefer to use limiting data types to ensure only plausible values so I can capture any such errors. What I have understood is that nativeint is preferred instead of integer in cross-platform apps. — Umair Ahmed, Nov 06 '13 at 10:11
@DavidHeffernan, I made some performance tests in loops in x64 and `NativeInt` were generally better optimized than using `Integer`. — LU RD, Nov 06 '13 at 10:11
@UmairAhmed I would not say that. `NativeInt` is pointer sized so if, for whatever reason, you cast a pointer to an integer, use `NativeInt`. Note also that Delphi strings have signed 32 bit indices. You cannot have a string whose length is greater than 2GB. — David Heffernan, Nov 06 '13 at 10:18
@LURD That surprises me. I'd expect that the extra cache demand from using 8 bytes per value vs 4 bytes would make `NativeInt` perform worse in real world scenarios. — David Heffernan, Nov 06 '13 at 10:19
@DavidHeffernan, somehow the compiler could optimize the use of the cpu registers better with `NativeInt`. If the compiler was more clever, the integer type would be on pair at least. — LU RD, Nov 06 '13 at 10:27
@DavidHeffernan i used NativeInt here because topicstarter used it. Personalyl i also usually use integer. OTOG i suspect in Win64 code Integer would be slower than Int64 just liek Byte is slower then Integer in Win32 code. The point was not "LongInt vs NativeInt" but an avoiding of using Byte. — Arioch 'The, Nov 06 '13 at 10:31
*I always prefer to use limiting data types to ensure only plausible values so I can capture any such errors* where is the code, checking for those errors and capturing it ? I cannot see it. So by using byte you do not "check" anything here... And if you WOULD add a check, then it does not matter if to check against byte or against integer. I don't believe your actual maximum-value would be 255 exactly, so declare it `byte` or `SmallInt` or `LongInt` - you would still have to add the check explicitly as an extra statement. And that means "byte" gives you no benefit but a trouble. — Arioch 'The, Nov 06 '13 at 10:33
@Arioch'The The longest possible string has length `high(Integer)`. So using `Integer` is fine. No checks needed. — David Heffernan, Nov 06 '13 at 10:34
@DavidHeffernan NativeInt is CPU register. CPUs are optimized on operating with registers as a whole object, not its split parts. Think about register renaming, register windows and other methods. Int32 is worse optimized in Win64 not because Delphi is stupid, but just because it does not fit hardware. Same reason as Byte is suboptimal in Win32 mode. — Arioch 'The, Nov 06 '13 at 10:36
@David that was a remark - see the quotation - to the topicstarter. However you made me wonder if we have to detect overflowing when computing total length. — Arioch 'The, Nov 06 '13 at 10:37
@Arioch'The If you want to make the function robust then yes you would have to do so. — David Heffernan, Nov 06 '13 at 10:48
@LURD then add here Delphi/LLVM and different ARM implementations and it gets ever more undefinable :-) — Arioch 'The, Nov 06 '13 at 11:11
Yes, going for optimal solutions on every platform will cost lots of time testing. In the end one have to ask what is good enough. Investing time to improve the TStringList can be a learning project to understand details, but to introduce such code into real projects will be risky. — LU RD, Nov 06 '13 at 12:38
The problem is that the fmx framework still has quiet a lot of areas necessitating optimization (maybe the level described here) and rather than waiting for EMB to fix it, like many of us are (and then the usual anger on regressions), I like to get that over with. As unlike many, I do have a successfully deployed service based on it, which is getting me good money every month. — Umair Ahmed, Nov 06 '13 at 12:51

Strange behavior of move with strings

2 Answers2