2

I'm trying to convert string using

Var
 encode:ansistring;
begin
  encode:=UTF8Encode('اختبار');
  showmessage(encode);
end;

It's working fine in Delphi 7

but in Delphi XE2 it's send Text as question marks

Any suggestions?

Mohammed Rabee
  • 345
  • 2
  • 8
  • 23
  • 5
    I've made an educated guess at the answer, but your actual code would have helped! It's not too late to add it to the question as an edit – David Heffernan Apr 09 '12 at 11:58
  • As david says, it's amazing that anybody is willing to help you at all if you don't show your code. In particular if you write `x := UTF8Encode(...`, what type is `x`? – Warren P Apr 09 '12 at 18:25
  • When I run your code it does not produce question marks. Is that the exact code you are running? How is your source code file encoded? What locale are you using on your machine? Did you try applying the points in my answer and/or Marco's article? – David Heffernan Apr 09 '12 at 20:37
  • @David Heffernan it's not the exact code, coz i'm home now anyway this code gives different result between Delphi7 and Delphi XE2. i'm thankful for you anyway – Mohammed Rabee Apr 09 '12 at 21:26

1 Answers1

15

In your Delphi 7 code you probably wrote something like this:

var
  UTF8: string;
  InputString: WideString;//I guess that you used WideString
.....
UTF8 := UTF8Encode(InputString);

This was fine in Delphi 7 where string is an alias for AnsiString. In XE2 the generic string type is now an alias for UnicodeString which is UTF-16 encoded. That means that when the code above is compiled by XE2, the UTF-8 encoded buffer returned by UTF8Encode is interpreted as UTF-16 encoded text. And that mismatch is what leads to your string full of question marks.

So, if you just wrote

var
  UTF8: AnsiString;
  InputString: string;//aliased to UnicodeString
.....
UTF8 := UTF8Encode(InputString);

then you would have the same behaviour as for your Delphi 7 code.

However, this is not the way to do it in Unicode Delphi. Instead you should use the UTF8String type. This is defined as AnsiString(65001) which means a string of 8 bit character units with code page 65001, i.e. the UTF-8 codepage. When you do this you don't need to call UTF8Encode at all since the encoding attached to the string type means that the compiler can generated code to convert the string. Now you would simply write:

var
  UTF8: UTF8String;
  InputString: string;//aliased to UnicodeString
.....
UTF8 := InputString;

The principal reference for the Unicode aspects of Delphi 2009 and later is Marco Cantù's white paper: Delphi and Unicode which I recommend that you read before proceeding.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490