2

First I get a TMemoryStream from an HTTP request, which contains the body of the response. Then I load it in a TStringList and save the text in a widestring (also tried with ansistring).

The problem is that I need to convert the string because the users language is spanish, so vowels with accent marks are very common and I need to store the info.

lServerResponse := TStringList.Create;
lServerResponse.LoadFromStream(lResponseMemoryStream);

lStringResponse := lServerResponse.Text;
lDecodedResponse := Utf8Decode(lStringResponse );

If the response (a part of it) is "Hólá Múndó", lStringResponse value will be "Hólá Múndó", and lDecodedResponse will be "Hólá Múndó".

But if the user adds any emoji (lStringResponse value will be "Hólá Múndó 😀" if the emoji is ) Utf8Decode fails and returns an empty string. Is there a way to get just the ANSI characters from a string (or MemoryStream)?, or removing whatever Utf8Decode can't convert?

Thanks for your time.

Daniel
  • 53
  • 7

1 Answers1

3

TMemoryStream is just raw bytes. There is no reason to loading that stream into a TStringList just to extract a (Wide|Ansi)String from it. You can assign the bytes directly to an AnsiString/UTF8String using SetString() instead, eg:

var
  lStringResponse: UTF8String;
  lDecodedResponse: WideString;
begin
  SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
  lDecodedResponse := UTF8Decode(lStringResponse);
end;

Just make sure the HTTP content really is encoded as UTF-8, or else this approach will not work.

That being said - UTF8Decode() (and UTF8Encode()) in Delphi 7 DO NOT support Unicode codepoints above U+FFFF, which means they DO NOT support Emojis at all. That was fixed in Delphi 2009.

To work around that issue in earlier versions, you can use the Win32 API MultiByteToWideChar() function instead, eg:

uses
  ..., Windows;

function My_UTF8Decode(const S: UTF8String): WideString;
var
  WLen: Integer;
begin
  WLen := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), nil, 0);
  if WLen > 0 then
  begin
    SetLength(Result, WLen);
    MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), PWideChar(Result), WLen);
  end else
    Result := '';
end;

var
  lStringResponse: UTF8String;
  lDecodedResponse: WideString;
begin
  SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
  lDecodedResponse := My_UTF8Decode(lStringResponse);
end;

Alternatively:

uses
  ..., Windows;

function My_UTF8Decode(const S: PAnsiChar; const SLen: Integer): WideString;
var
  WLen: Integer;
begin
  WLen := MultiByteToWideChar(CP_UTF8, 0, S, SLen, nil, 0);
  if WLen > 0 then
  begin
    SetLength(Result, WLen);
    MultiByteToWideChar(CP_UTF8, 0, S, SLen, PWideChar(Result), WLen);
  end else
    Result := '';
end;

var
  lDecodedResponse: WideString;
begin
  lDecodedResponse := My_UTF8Decode(PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
end;

Or, use a 3rd party Unicode conversion library, like ICU or libiconv, which handle this for you.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • [ICONV v1.16 pre-compiled DLLs can be downloaded here](https://github.com/pffang/libiconv-for-Windows/tree/master/lib64). – AmigoJack Feb 17 '21 at 01:09
  • ... or use an actual Delphi version – Delphi Coder Feb 17 '21 at 02:05
  • I had already tried to solve the problem using MultiByteToWideChar without success, I'm glad I asked. Thank you. – Daniel Feb 17 '21 at 02:17
  • Delphi Coder, Im just an employee, I can't make those decisions – Daniel Feb 17 '21 at 02:19
  • 1
    @Daniel Maybe you can't make decisions on that, but you can demand the best tools for your job. See #9 on [Joel's checklist](https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/). – Peter Wolf Feb 17 '21 at 08:47