0

Can some help me with the right way to correctly display UTF-8 unicode string ?

I am calling a procedure that receives a text string from web service. The procedure works fine a string is received perfectly. However, since the string contains an UTF-8 text, it displays unicode letters as numbers ...

{"displayName":"\u062a\u0637\u0628\u064a\u0640\u0640\u0640\u0642 \u062f\u0639\u0640\u0640\u0640\u0640\u0640\u0627\u0621"

Delphi Berlin should support UTF-8 but I do not which function to use for encoding the UTF-8 and display the Text (Arabic Text) !!

Procedure TF_Main.GnipHTTPSTransfer(Sender: TObject; Direction: Integer; BytesTransferred: Int64; PercentDone: Integer; Text: String);
Begin
  Inc(Transfer_Count);
  L_Counter.Caption:=IntToStr(Transfer_Count);
  write(GNIP_Text_File, Text);
  M_Memo.Lines.Add(text);
End;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • I know very little about Delphi, but are you sure that code example is correct? Because it looks like it has syntax errors your example should be a [mcve]. – Martin Tournoij Sep 03 '16 at 00:39
  • 1
    Your code sample is invalid; it won't compile. That's also not a UTF-8 string - it's an encoded UTF-8 string. Berlin doesn't have any way to know that what you're adding to the memo is anything other than the exact value you're providing it. Why would you expect it to behave any differently than it is? You're telling it to put the string `'\u062a\u0637\u0628\u064a\u0640\u0640\u0640\u0642 \u062f\u0639\u0640\u0640\u0640\u0640\u0640\u0627\u0621'` into a memo control, and it's doing precisely what your code says to do. – Ken White Sep 03 '16 at 00:46
  • Actually, it is an encoded Unicode string, no matter what specific encoding it comes from. `\uxxxx` can be decoded as UTF-8 as well as UTF-16 or UTF-32. – Rudy Velthuis Sep 03 '16 at 09:52

1 Answers1

6

The string is not UTF-8. Even if it were transferred over HTTP using UTF-8, it is no longer UTF-8 in your Text string, it is UTF-16 instead. Its content is a JSON-encoded object, which has a displayName field containing Unicode characters that are encoded using escape sequence notation (which is not strictly required in JSON, but is nonetheless supported). Each \uXXXX is the escaped textual representation of a UTF-16 codeunit value (\u062a is Unicode codepoint U+062A ARABIC LETTER TEH, \u0637 is U+0637 ARABIC LETTER TAH, etc).

Delphi has a JSON framework, which will decode the escape sequences for you. For example:

uses
  ..., System.JSON;

procedure TF_Main.GnipHTTPSTransfer(Sender: TObject; Direction: Integer; BytesTransferred: Int64; PercentDone: Integer; Text: String);
var
  JsonVal: TJSONValue;
  JsonObj: TJSONObject;
begin
  Inc(Transfer_Count);
  L_Counter.Caption := IntToStr(Transfer_Count);
  write(GNIP_Text_File, Text);
  M_Memo.Lines.Add(Text);

  JsonVal := TJSONObject.ParseJSONValue(Text);
  if JsonVal <> nil then
  try
    JsonObj := JsonVal as TJSONObject;
    M_Memo.Lines.Add(JsonObj.Values['displayName'].Value); // تطبيـــق دعـــــاء
  finally
    JsonVal.Free;
  end;
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Did modify the code as suggested but ALWAYS getting an "Access violation at address 005F1060 in Module GNIP_Consumer.exe. Read of Address 00000008" – Abdullah Aldahlawi Sep 03 '16 at 07:41
  • `ParseJSONValue()` returns a nil pointer if the parsing fails. I have updated my example to reflect that. – Remy Lebeau Sep 03 '16 at 08:10
  • 1
    The modified code solved the problem and the Arabic text is shown for the JSON value specified (displayname). However, My ultimate objective is to retrieve the WHOLE returned 'Text' in one variable/string in order to store in a text file for further processing. I am not sure if that can be done in Delphi ??? – Abdullah Aldahlawi Sep 03 '16 at 11:18
  • 1
    @Abdullah: Of course it can. You're putting it into a TMemo (with M_Memo.Lines.Add). Once you get it all in that memo, us M_Memo.Lines.SaveToFile to save the memo's content to whatever file you want. – Ken White Sep 03 '16 at 14:48
  • you are right Ken, but that way I will have to parse all the JSON structure and extract every escape sequence individually using the Values[' '] index for each JSON pair. This is a length process !!. There must be a direct conversion that MAP a string that has escape sequence to its corresponding foreign language characters. – Abdullah Aldahlawi Sep 03 '16 at 15:47
  • @AbdullahAldahlawi you need to parse the string. If you don't want to use the existing JSON parser, then parse the string yourself manually. JSON is a very simple format to parse, especially if you are only interested in escape sequences and nothing else. But really, what processing are you wanting to do that doesn't involve extracting strings from the JSON first? – Remy Lebeau Sep 03 '16 at 17:58