1

I am having problems getting length of String in Delphi from an FPC DLL. Which is weird because I can get the String back from the DLL but I can't get its length.

Delphi:

program Project2;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

function Test(const S: String): Integer; cdecl; external 'c:\Project1.dll';

var
    A: String;
begin 
    A := 'test';
    WriteLn(Test(A)); // 1 ?
    ReadLn;
end.

FPC:

library project1;

{$mode ObjFPC}{$H+}

uses
  Classes;

function Test(const A: String): Integer; cdecl; export;
begin
 Result := Length(A);
end;

exports Test;

end.
user3060326
  • 187
  • 2
  • 16

2 Answers2

3

String in Delphi 2009+ is UnicodeString, and AnsiString in earlier versions.

String in FPC is always AnsiString, it never maps to UnicodeString. And AFAIK, FPC's string types are not binary compatible with Delphi's string types anyway. So you cannot pass a Delphi AnsiString to a FPC AnsiString and vice versa, and the same for UnicodeString.

You should not be passing String values over the DLL boundary anyway, especially when multiple compilers are involved, and especially since you are not using FPC's Delphi mode. You need to redesign your DLL to be more portable, eg:

FPC:

library project1;

{$mode ObjFPC}
{$H+}

uses
  Classes;

function TestA(const A: PAnsiChar): Integer; cdecl; export;
begin
 Result := Length(A);
end;

function TestW(const A: PWideChar): Integer; cdecl; export;
begin
 Result := Length(A);
end;

exports TestA, TestW;

end.

Delphi:

program Project2;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

function Test(const S: PChar): Integer; cdecl; external 'Project1.dll' name {$IFDEF UNICODE}'TestW'{$ELSE}'TestA'{$ENDIF};

var
  A: String;
begin 
  A := 'test';
  WriteLn(Test(PChar(A)));
  ReadLn;
end.
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • There is no conversion performed, and no performance penalty. Casting a `UnicodeString` to a `PWideChar` merely retrieves a pointer to the `UnicodeString`'s character data, or a pointer to a static `#0` character if the `UnicodeString` is empty. – Remy Lebeau Feb 25 '14 at 20:14
  • It doesn't seem that important that string is always AnsiString in FPC and either AnsiString or UnicodeString in Delphi. It would be better to stress that FPC.AnsiString is not equal to Delphi.AnsiString. – David Heffernan Feb 25 '14 at 20:26
  • @RemyLebeau Still I benchmarked a function in Delphi. And calling it with PWideChar instead of String type adds some latency to it. So obviously this approach isn't without its down sides. – user3060326 Feb 25 '14 at 21:14
  • @RemyLebeau `PAnsiChar` is a lot faster tho. Not sure why. – user3060326 Feb 25 '14 at 21:28
  • @user The downside of the alternative is that it doesn't work. – David Heffernan Feb 25 '14 at 21:29
  • @user Your benchmark is private. Hard to comment on it. Is the parameter passing your bottleneck? – David Heffernan Feb 25 '14 at 21:32
  • @RemyLebeau Can you explain why `{$mode delphiunicode}` works? – user3060326 Feb 25 '14 at 22:15
  • @user3060326: Calling `Length()` on a `PChar` is always going to be slower than calling `Length()` on a `String`. `Length(PChar)` has to count the individual characters until `#0` is found, whereas `Length(String)` simply returns the string's internal length field as-is and does not have to count anything. – Remy Lebeau Feb 25 '14 at 22:57
  • @user3060326: `{$MODE DelphiUnicode}` (which is not documented and not guaranteed to work) merely tells FPC to map `String` to `UnicodeString` instead of `AnsiString` for code compatibility with Delphi 2009+. Nothing more. It is NOT a solution to the problem of **correct** interop between FPC and Delphi. – Remy Lebeau Feb 25 '14 at 23:02
  • @RemyLebeau It also enables other features such as record helpers etc. Which objfpc mode doesn't allow. – user3060326 Feb 25 '14 at 23:08
  • 1
    Switching to `PChar`s w/o length indicator will introduce a potential buffer overflow vulnerability. – Free Consulting Feb 26 '14 at 11:03
  • @DavidHeffernan: I think he was referring to the DLL side. Yes, on the caller side, `PChar(String)` guarantees a null terminator. But on the called side, a `PChar` parameter is not guaranteed to be null terminated, depending on what the caller actually passes in. When dealing with `PChar` pointers being passed around, it is best to also pass around the length of the buffer that the `PChar` is pointing to. In this particular situation, that negates the user's example. But in most real world situations, it is usually possible and preferred. – Remy Lebeau Feb 26 '14 at 17:35
  • No. You've got that wrong. Only need to pass length if there is danger of buffer overrun. And there is only such danger when writing to the buffer. This is an IN param. Therefore no length needed. Ask yourself why, for instance, CopyFile does not receive length params. – David Heffernan Feb 26 '14 at 17:40
  • @DavidHeffernan: **IN THIS EXAMPLE**, the `PChar` is an IN parameter, and is expected to be null-terminated. But that is not always the case in all situations where a `PChar` parameter is used. And it is possible to pass around a `PChar` to a buffer that is not null terminated, so even reading **CAN** go out of bounds if you are not careful. As for `CopyFile()`, it expects null-terminated character pointers, so it does not need to be told the length. I am not stupid. – Remy Lebeau Feb 26 '14 at 17:45
  • You only have risk of buffer overrun when you are going to write to a buffer. Only then does passing length offer protection. For the situation at hand, there is no danger of buffer overrun and you and @Free are giving poor advice by suggesting that a length param would be beneficial here. Perhaps you did not mean to suggest that. So I have made the point explicitly. – David Heffernan Feb 26 '14 at 17:49
  • 1
    @DavidHeffernan: **IN THE EXAMPLE FOR THIS QUESTION**, a null terminator is guaranteed by `PChar(String)`. But **IN GENERAL**, a null terminator is not guaranteed, eg: `var Buffer: array[0..5] of Char; Buffer is filled without a null terminator; SomeFuncThatTakesAPChar(@Buffer[0]);` A buffer overrun can occur if the function expects a null terminator and is not given one. Passing the buffer length would avoid that: `SomeFuncThatTakesAPChar(@Buffer[0], Length(Buffer))`. That is all I am saying. It does not apply **TO THE EXAMPLE IN THIS QUESTION**, but it can apply to other situations. – Remy Lebeau Feb 26 '14 at 18:17
  • We are in complete agreement now that you have clarified what you meant. – David Heffernan Feb 26 '14 at 18:29
  • String btw can be shortstring (TP mode), ansistring (Delphi mode and with $H+ in some other modes) and unicodestring (mode delphunicode, only in trunk). Units from various modes can be used in one program, but a mode is a per unit decision – Marco van de Voort Mar 01 '14 at 17:49
  • @MarcovandeVoort: Delphi users got in an uproar about mixing 0-based and 1-based strings together in the same project, but at least `String` maps to a single type in all units. I can't imagine having `String` potentially map to different types in the same project. – Remy Lebeau Mar 01 '14 at 18:27
  • It's nothing a minor progression of $H aka $LONGSTRINGS. And yes, the 0-based string type is hare-brained. Yes, if you start over, 0-based could be discussed (I still wouldn't be in favor, but I don't feel really strong about), but introducing it in a 20 year old series of compiler projects with legacy from here to eternity, /madness/. Probably I should update http://www.stack.nl/~marcov/delphistringtypes.txt with some more :-) – Marco van de Voort Mar 01 '14 at 19:20
2

You cannot use string across this module boundary. The Delphi type is simply different from the FPC type. It is true that they have the same name but that does not make them the same type.

In fact even if both modules were compiled with the same compiler, they would be different types, allocated off different heaps and not valid for interop. In Delphi you could use Sharemem and the exact same compiler version but that is pretty constraining.

Use an interop friendly type such as PWideChar for UTF-16 or PAnsiChar for UTF-8. That way your library is not constrained and can interop with anything.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • @Ritsaert That's what I said I think. – David Heffernan Feb 25 '14 at 20:16
  • @Ritsaert I never even said the types were the same now. In fact I said that they are different full stop. You can only make them the same by forcing identical compiler and shared memory manager. I can't see what the problem is. – David Heffernan Feb 25 '14 at 20:24
  • @DavidHeffernan The solution is `{$mode delphiunicode}`. And I can have native String interop. – user3060326 Feb 25 '14 at 21:51
  • @user No you cannot. What I state in my answer is accurate. – David Heffernan Feb 25 '14 at 21:52
  • @Ritsaert The types are always different. They are never the same. I don't see the point of hypothesising about a scenario that never occurs. – David Heffernan Feb 25 '14 at 22:00
  • @DavidHeffernan Well it seems to be working with that flag. I use a custom Pos function from FPC DLL in Delphi. And it returns the same as if its called in Delphi. – user3060326 Feb 25 '14 at 22:02
  • @user If that's good enough for you to be convinced then good luck to you – David Heffernan Feb 25 '14 at 22:03
  • @DavidHeffernan Now that said.. the same function is 90ms faster in FPC 64 with SSE42. Than XE5 64 bit can do. – user3060326 Feb 25 '14 at 22:22
  • These timings are meaningless since your benchmark is private. I'm not sure why you are asking for our help. – David Heffernan Feb 25 '14 at 22:23
  • @RitsaertHornstra, what you are talking is simply not true. Internal memory layout has been documented on the both sides. Both sides are publishing technotes when they are introducing a breaking changes. – Free Consulting Feb 26 '14 at 11:06
  • @RitsaertHornstra, did you try Google? http://docwiki.embarcadero.com/RADStudio/XE4/en/Internal_Data_Formats#Long_String_Types – Free Consulting Feb 26 '14 at 22:45
  • @Free Consulting: You're right. They did spec the internal format, since D2010. Before that they show what the format was but with a disclaimer that it is undocumented. Since they removed that I assume this is a spec now hence rendering my argument moot. NB: I did try Google but apparently I used the wrong keywords. – Ritsaert Hornstra Feb 26 '14 at 23:28
  • @RitsaertHornstra, no, they didn't. Information on memory layout for string always been published. – Free Consulting Feb 27 '14 at 07:09
  • Even knowing the layout, it is subject to change from version to version, and it is allocated off a different heap. – David Heffernan Feb 27 '14 at 07:10