14

I've been looking through the TDataset class and its string fields, in Delphi XE2 and noticed that AsWideString returns a type of UnicodeString. However it gets the value from the function TField.AsString: String which in turn calls TFIeld.AsAnsiString:AnsiString. Therefore any unicode characters would be lost? Also the buffer which is passed to TDataset.GetFieldData is declared as an array of AnsiChar.

Am I understanding this correctly?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
There is no spoon
  • 1,775
  • 2
  • 22
  • 53
  • 2
    +1 Since this behavior is IMHO a VCL wrong implementation. It is IMHO a wrong naming, *inconsistent with the rest the VCL/RTL* and a source of lot of confusion/misunderstanding. Your question does perfectly sense. – Arnaud Bouchez Apr 22 '13 at 05:21

2 Answers2

13

No, you should be examining the TWideStringField class which is for Unicode fields and the TStringField class which is for non-Unicode strings. TField is just a base class and TField.GetAsWideString is a virtual method with a fall back implementation that is overridden by descendants that are Unicode aware.

Jeroen Wiert Pluimers
  • 23,965
  • 9
  • 74
  • 154
Henrick Hellström
  • 2,556
  • 16
  • 18
  • 2
    Nice side effect of having two string field classes: database migrations to Unicode require replacing all TStringField in DFMs with TWideStringField (and many other source code changes), where developers would expect a smooth transition – mjn Feb 27 '12 at 06:27
  • 3
    @mjn, this approach will let you make the transition to unicode in your app without changing the underlying database fields. TWideStringField has been around for years and is not related to Delphi switching to Unicode. If the field in the database has been unicode before, you already had to use TWideStringField in say Delphi 5 anyway. You use TStringField if the database field is just an AnsiString. Delphi won't change your data definition and data in the database automagically. – Uwe Raabe Feb 27 '12 at 08:26
  • 1
    @mjn Actually no since normally it depends on the database if the value is unicode or not. So if your database was unicode in the old Delphi version it was TWideStringField already. If not why should it be in the newer version if the database still does not have unicode? – Stefan Glienke Feb 27 '12 at 08:37
  • @Uwe I mean migrating the database to Unicode, when Delphi 2009 is already in use - TWideStringField does not work for persistent ANSI fields – mjn Feb 27 '12 at 11:02
  • Using `TStringField` for `AnsiString` and `TWideStringField` for `string=UnicodeString` just does not make sense at all. It is IMHO a wrong naming, *inconsistent with the rest the VCL/RTL* and a source of [lot of confusion](http://synopse.info/forum/viewtopic.php?pid=7452#p7452). – Arnaud Bouchez Apr 22 '13 at 05:20
  • It's not so simple as blaming the VCL. Underlying Database string types like char and varchar don't suddenly become Unicode, so neither should TStringField which has always been associated with `char(x)` and `varchar(x)` SQL types. What did anyone seriously expect a CAST to do when your underlying field db type is still ansi? Yes conversion is harder in this area, but if they made TStringField ambiguous they would have had to require 90% of devs to switch to what, a newly created "TAnsiStringField"? That would have been much worse. – Warren P Apr 22 '13 at 12:46
  • @WarrenP Doesn't it that what is expected from every programmer that is porting to a new version of Delphi? For example, I had to change one of my components in many places to AnsiString what were before only String to work "as expected" in Delphi XE. – EMBarbosa Apr 24 '13 at 22:48
  • You totally didn't get my point did you? Yes, your code is yours and you have to fix it. But my point is that Unicode isn't something you can guess or make any One Size Fits All assumptions about. The db area of Delphi has got certain underlying realies to deal with and if people think about those realities then they won't be surprised so often. – Warren P Apr 24 '13 at 23:24
  • @WarrenP Agreed. That should not surprise us, yet, the feel of inconsistent naming just surprises. – EMBarbosa Apr 25 '13 at 00:03
6

YES, you did understand it correctly. This is the VCL and its documentation which are broken. Your confusion does perfectly make sense!

In the Delphi 2009+ implementation, you have to use AsString property for AnsiString and AsWideString for string=UnicodeString.

In fact, the As*String properties are defined as such:

property AsString: string read GetAsString write SetAsString;
property AsWideString: UnicodeString read GetAsWideString write SetAsWideString;
property AsAnsiString: AnsiString read GetAsAnsiString write SetAsAnsiString;

How on earth may we be able to find out that AsString returns an AnsiString? It just does not make sense at all, when compared to the rest of the VCL/RTL.

The implementation, which uses TStringField class for AnsiString and TWideStringField for string=UnicodeString is broken.

Furthermore, the documentation is also broken:

Data.DB.TField.AsString

Represents the field's value as a string (Delphi) or an AnsiString (C++).

This does not represent a string in Delphi, but an AnsiString! The fact that the property uses a plain string=UnicodeString type is perfectly missleading.

On the database point of view, it is up to the DB driver to handle Unicode or work with a specific charset. But on the VCL point of view, in Delphi 2009+ you should only know about string type, and be confident that using AsString: String will be Unicode-ready.

Arnaud Bouchez
  • 42,305
  • 3
  • 71
  • 159