23

I've got some unexpected access violations for Delphi code that I think is correct, but seems to be miscompiled. I can reduce it to

procedure Run(Proc: TProc);
begin
  Proc;
end;

procedure Test;
begin
  Run(
    procedure
    var
      S: PChar;

      procedure Nested;
      begin
        Run(
          procedure
          begin
          end);
        S := 'Hello, world!';
      end;

    begin
      Run(
        procedure
        begin
          S := 'Hello';
        end);
      Nested;
      ShowMessage(S);
    end);
end;

What happens for me is that S := 'Hello, world!' is storing in the wrong location. Because of that, either an access violation is raised, or ShowMessage(S) shows "Hello" (and sometimes, an access violation is raised when freeing the objects used to implement anonymous procedures).

I'm using Delphi XE, all updates installed.

How can I know where this is going to cause problems? I know how to rewrite my code to avoid anonymous procedures, but I have trouble figuring out in precisely which situations they lead to wrong code, so I don't know where to avoid them.

It would be interesting to me to know if this is fixed in later versions of Delphi, but nothing more than interesting, upgrading is not an option at this point.

On QC, the most recent report I can find the similar #91876, but that is resolved in Delphi XE.

Update:

Based on AlexSC's comments, with a slight modification:

...

      procedure Nested;
      begin
        Run(
          procedure
          begin
            S := S;
          end);
        S := 'Hello, world!';
      end;

...

does work.

The generated machine code for

S := 'Hello, world!';

in the failing program is

ScratchForm.pas.44: S := 'Hello, world!';
004BD971 B89CD94B00       mov eax,$004bd99c
004BD976 894524           mov [ebp+$24],eax

whereas the correct version is

ScratchForm.pas.45: S := 'Hello, world!';
004BD981 B8B0D94B00       mov eax,$004bd9b0
004BD986 8B5508           mov edx,[ebp+$08]
004BD989 8B52FC           mov edx,[edx-$04]
004BD98C 89420C           mov [edx+$0c],eax

The generated code in the failing program is not seeing that S has been moved to a compiler-generated class, [ebp+$24] is how outer local variables of nested methods are accessed how local variables are accessed.

  • In my tests I got this warning "[DCC Warning] Unit1.pas(45): W1036 Variable '$frame' might not have been initialized". Since I didn´t declared any $frame variable I assume it was generated by the compiler when declaring the interfaces that implement the anonymous methods. The warning suggests that not everything was done correctly by the compiler, so it seems to be a bug. Changing the code to have the S variable declard as string makes the problem to show itself earlier. Debugging suggests the S variable wasn´t properly handled by the generated code. – AlexSC Sep 04 '13 at 11:07
  • @AlexSC The "might not have been initialized" detection is notoriously bad, there are tons of false positives that don't point to any real problem and don't affect generated code, so that's a warning that should be safe to ignore. I can also get that warning (including the `$frame` compiler-generated variable) in simpler code that does work correctly. –  Sep 04 '13 at 11:10
  • 1
    Compiles and works ok in XE2 – Sertac Akyuz Sep 04 '13 at 11:19
  • 1
    @hvd: the strangest thing is that if I add a line like S := ''; inside the empty anonymous method the exception doesn´t happen. That strongly suggests me that there is indeed a bug in the handling of the S variable; – AlexSC Sep 04 '13 at 11:20
  • @AlexSC Thanks, that's an interesting observation, and looking at the generated machine code for `S := 'Hello, world!';` with and without `S := '';` shows a difference. –  Sep 04 '13 at 11:25
  • 1
    @SertacAkyuz Damnit, the very next version. :) –  Sep 04 '13 at 11:26

2 Answers2

1

Without seeing the whole Assembler Code for the whole (procedure Test) and only assuming on the Snippet you posted, it's probably that on the failing Snippet only a Pointer has been moved where on the correct version there is some Data moved too.

So it seems that S:=S or S:='' causes the Compiler to create a reference by it's own and could even allocate some Memory, which would explain why it works then.

I also assume that's why a Access Violation occurs without S:=S or S:='', because if there is no Memory allocated for the String (remember you only declared S: PChar) then a Access Violation is raised because non-allocated Memory was accessed.

If you simply declare S: String instead, this probably won't happen.

Additions after extended Commenting:

A PChar is only a Pointer to Data Structure, that must exist. Also another common Issue with PChar is to declare local Variables and then passing a PChar to that Variable to other Procs, because what happens is that the local Variable is freed once the routine ends, but the PChar will still point to it, which then raise Access Violations once accessed.

The only possibility that exists per Documentation is declaring something like that const S: PChar = 'Hello, world!' this works because the Compiler can resolve a relative Adresse to it. But this only works for Constants and not for Variables like on the Example above. Doing it like in the Example above needs Storage to be allocated for the string literal to which the PChar then points to like S:String; P:PChar; S:='Hello, world!'; P:=PChar(S); or similar.

If it still fails with declaring String or Integer then perhaps the Variable disappears somewhere along or suddenly isn't visible anymore in a proc, but that would be another Issue that has nothing to do with the existing PChar Issue explained already.

Final Conclusion:

It's possible to do S:PChar; S:='Hello, world!' but the Compiler then simply allocates it as a local or global Constant like const S: PChar = 'Hello, world!' does which is saved into Executable, a second S := 'Hello' then creates another one which is also saved into Executable and so on - but S then only points to the last one allocated, all others are still in the Executable but not accessible any more without knowing the exact Location, because S only points to the last one allocated.

So depending which one was the last S points either to Hello, world! or Hello. On the Example above i can only guess which one was the last and who knows perhaps the Compiler can only guess too and depending on optimizations and other unpredictable Factors S could suddenly point to unallocated Mem instead of the last one by the Time Showmessage(S) is executed which then raises a Access Violation.

  • I assure you, I know how pointers work. I know that pointer variables don't contain the pointed-to data. But string literals aren't refcounted, they exist directly in the executable image, and there is no data that needs to be copied, just the pointer itself. I didn't use `string`, because non-managed types such as `PChar` produce simpler assembly, making for easier inspection of the generated assembly. But it does fail when using `string` as well, and in fact, you can see the problem even if you use a variable of type `Integer`. –  Apr 17 '15 at 17:15
  • Maybe it fails as well, but you still can't simply assign 'Hello, world!' to a PChar like that because it's only a Pointer to Data, but on the Example above there is no Data where it points to. If it still fails with declaring `String` or `Integer` then perhaps the Variable disappears somewhere along or suddenly isn't visible any more for a proc. A full ASM Source of the whole Proc would help reveal what really happens else step-trace trough and watch yourself where the Variable disappear or where exactly the Access Violation occurs to narrow it down. – Amenominakanushi Apr 17 '15 at 19:01
  • Like I said already, string literals aren't refcounted, they don't go away. You can keep a pointer to a string literal without any worries. That's not just an implementation detail, that's an explicit promise made in the documentation that's safe to rely on. [Read it here.](http://docwiki.embarcadero.com/RADStudio/XE/en/Internal_Data_Formats#Long_String_Types) As for inspecting the generated assembly, I did that already and I showed the relevant details in the question. The variable doesn't disappear, and I already found out precisely where the access violation occurs. –  Apr 17 '15 at 19:15
  • Yeah, but PChar itself isn't a string literal, it's a Pointer to a string literal or other Data Structure, that must exist. And i don't see any string literal on the Example above, there is only the PChar declared. Also another common Issue with PChar is to declare local Variables and then passing a PChar to that Variable to other Procs, what happens is that the local Variable is freed once the routine ends, but the PChar will still point to it, which then raise Access Violations once accessed. So it's not entirely true that you can keep a pointer to a string literal without any worries. – Amenominakanushi Apr 17 '15 at 19:43
  • If you don't know what a string literal is, please look it up. If you do know what a string literal is: please re-read your comment, because it doesn't make any sense. –  Apr 17 '15 at 19:50
  • In contrary it makes sense, because where do you think does your string literal 'Hello, world!' go? Nowhere, so there no string literal. The only possibility that exists per Documentation is declaring something like that `const S: PChar = 'Hello, world!'` this works because the Compiler can resolve a relative Adresse to it. But this only works for Constants and not for Variables like on the Example above. Doing it like you do in the Example need Storage to be allocated for the string literal to which the PChar then points to like `S:String; P:PChar; S:='Hello, world!'; P:=S;` or similar. – Amenominakanushi Apr 17 '15 at 20:18
  • @Amen - Why don't you compile the program and then search for 'Hello' in the executable? You will find it. Alternatively, you may read the documentation link hvd posted. – Sertac Akyuz Apr 17 '15 at 21:15
  • @sertac-akyuz - Just tested it and you are right it has been allocated and i found it with Hiew in the Executable, but assigning different strings to it also caused to save all of them separately in the Executable, but only the last was accessible, so still not a good Idea to do. So apparently it causes the Compiler to create Constants which are then all stored in the Executable, but the Pointer logically points only to the last, the others are fubar. – Amenominakanushi Apr 17 '15 at 21:37
  • @Amen - It should prove you that when a variable is assigned a string literal no storage is allocated for character data. – Sertac Akyuz Apr 17 '15 at 21:55
  • Yeah but its not so different as the possibility already mentioned, it's somehow like `const S: PChar = 'Hello, world!'` because for each one assigned with `S:='whatever'` a Constant is saved into the Executable, so the Problem is that assigning different ones like that creates multiple Constants but logically only the last one is accessible via S. All the others created before are only waste. – Amenominakanushi Apr 17 '15 at 22:08
0

How can I know where this is going to cause problems?

It's hard to tell at this point in time.
If we knew the nature of the fix in Delphi XE2 we'd be in a better position.
All you can do is refrain from using anonymous functions.
Delphi has had procedural variables, so the need for anonymous functions ready is not that dire.
See http://www.deltics.co.nz/blog/posts/48.

It would be interesting to me to know if this is fixed in later versions of Delphi

According to @Sertac Akyuz this has been fixed in XE2.

Personally I dislike anonymous methods and have had to ban people from using them in my Java projects because a sizable proportion of our code base was going anonymous (event handlers).
Used in extreme moderation I can see the use case.
But in Delphi where we have procedural variables and nested procedures... Not so much.

Johan
  • 74,508
  • 24
  • 191
  • 319
  • Sorry, but this does nothing to answer my question. "That unknowable obviously."? Really? It's a bug that happens for certain combinations of nested functions and anonymous functions, and someone with more skills or knowledge than me would certainly be capable of describing that "certain combinations" in more detail, which is what I'm asking. There's no need to avoid them entirely. –  Oct 06 '13 at 17:04
  • 1
    I have one use case where they are tricky to avoid without complicating the code: a function that returns a `reference to procedure` that gets passed around and ends up called by other code. The compiler-generated helper object gets freed automatically, because the `reference to procedure` type is reference-counted. With `procedure of object`, I would need to manually add a lot of extra code to ensure that object gets freed. The only practical alternative would be to manually define an `interface` and have the helper class implement it. Java has garbage collection, so it's not a problem there. –  Oct 06 '13 at 17:06