2

I am having trouble converting text within an edit box to a WideChar. This is being used in code for printing emoji characters.

If I manually set the WideChar values like the following it works

Emoji[1] := WideChar($D83D);
Emoji[2] := WideChar($DC4D);

But I want to be able to set the hex codes via edit boxes as follows

StringToWideChar(edit1.text, @wc1, Length(edit1.text));
StringToWideChar(edit2.text, @wc2, Length(edit2.text));
Emoji[1] := wc1;
Emoji[2] := wc2;

wc1 and wc2 are defined as WideChar. The edit boxes contain the same values as are hard coded above. That code results in a blank output, so something is wrong with the conversion.

What am I doing wrong? Thanks for any help here.

J...
  • 30,968
  • 6
  • 66
  • 143
Some1Else
  • 715
  • 11
  • 26
  • Why not just type the actual Emoji into the `TEdit` and then use its `Text` as-is? [`StringToWideChar()`](http://docwiki.embarcadero.com/Libraries/en/System.StringToWideChar) doesn't do what you think it does. It is meant for converting a `String` to a `WideChar[]` buffer of equivalent length (ie, originally for converting `AnsiString` to `PWideChar`, now just a plain copy). It is not meant for paring a whole `String` into a single `WideChar`. – Remy Lebeau Apr 27 '21 at 22:02
  • 10.4 so Andreas' code (with Remy's edit) works fine. `Emoji[1]:=char(strtoint(edit1.text));` – Some1Else Apr 27 '21 at 22:04
  • Why use a separate `TEdit` for each UTF-16 codeunit? Why not use a single `TEdit` to enter a whole codepoint? If you don't want the user to enter the actual Emoji symbol, then at least enter its codepoint value (ie, `'$1F44D'`) and then you can convert that to an integer with `StrToInt()` and then use [`TCharacter.ConvertFromUtf32()`](http://docwiki.embarcadero.com/Libraries/en/System.Character.TCharacter.ConvertFromUtf32) or [`TCharHelper.ConvertFromUtf32()`](http://docwiki.embarcadero.com/Libraries/en/System.Character.TCharHelper.ConvertFromUtf32) to convert that to a proper `string`. – Remy Lebeau Apr 27 '21 at 22:31
  • Thank you for the tip Remy. Using the whole codepoint and converting with ConvertFromUtf32 is much cleaner code and simpler. – Some1Else Apr 28 '21 at 00:36

1 Answers1

3

You mustn't interpret the string '$D83D' as text -- instead, you must parse it as an integer.

First, you need to obtain the text from the edit box. This is Edit1.Text. Then you need to convert this to an integer. For instance, you can use StrToInt or TryStrToInt. Then you simply need to reinterpret (cast) this integer as a Char:

procedure TForm1.Edit1Change(Sender: TObject);
var
  CodeUnit: Integer;
begin
  if TryStrToInt(Edit1.Text, CodeUnit) and InRange(CodeUnit, 0, $FFFF) then
    Label1.Caption := Char(CodeUnit)
  else
    Label1.Caption := '';
end;

Here, as a bonus, I also validate that the supposed codeunit is an actual 16-bit unsigned integer using InRange (I mean, the user could in theory type 123456789). Delphi's StrToInt functions support hex using the dollar sign notation.

Andreas Rejbrand
  • 105,602
  • 8
  • 282
  • 384
  • 2
    Technically, this is not actually working with *codepoints* at all, but rather with *codeunits* instead. That is an important distinction to make. U+1F44D () is a *codepoint*, `D83D DC4D` is a UTF-16 *codeunit* sequence (aka a *surrogate pair*). But yes, `0..$FFFF` is the correct range for a UTF-16 *codeunit*, which is what Delphi's `WideChar` represents. And you should be using a `Char()` type-cast, not `Chr()`. – Remy Lebeau Apr 27 '21 at 22:12
  • @RemyLebeau: Why do you prefer `Char` instead of `Chr`? – Andreas Rejbrand Apr 27 '21 at 22:16
  • 2
    Well, for one thing, because the `Chr()` documentation says to. And second, because `Chr(X)` is not guaranteed to always return a `Char` with a value of `X` when `X` is `128..255`, whereas `Char(X)` is guaranteed to. – Remy Lebeau Apr 27 '21 at 22:25
  • @RemyLebeau: Can you given an example of a Delphi 2009+ piece of code in which `Chr` and `Char` doesn't yield the same result when given an integer in the range `0..$FFFF`? I partly agree with both your reasons, but only partly (the [documentation](http://docwiki.embarcadero.com/Libraries/Sydney/en/System.Chr) doesn't say you mustn't use `Chr` and in practice I don't think I have ever seen a difference). – Andreas Rejbrand Apr 27 '21 at 22:33
  • For example, I've seen cases where `Chr(128)` may return `Char($20AC)` (`0x80` is the Euro symbol in some charsets). Whereas `Char(128)` is always `Char($80)` – Remy Lebeau Apr 27 '21 at 22:37
  • @RemyLebeau: That doesn't happen on my 10.3.2 in Windows, 32-bit. I imagine it could depend on the OS (legacy?) locale settings, but looking at the assembly, that appears not to be the case. (I vaguely recall a compiler directive related to this, though.) – Andreas Rejbrand Apr 27 '21 at 22:39
  • it is possible that it has since been fixed, but it wasn't always like that, and I learned a long time ago to not trust `Chr()` when converting an integer to a character, when a simple type-cast will suffice. – Remy Lebeau Apr 27 '21 at 22:47
  • @RemyLebeau: I agree that `#128` is problematic unless `{$HIGHCHARUNICODE ON}`, but `Chr(128)` seems safe. Still, I really don't mind using `Char` instead, so I have changed it (even though I don't think it makes any difference whatsoever in this case). – Andreas Rejbrand Apr 27 '21 at 22:50
  • Also, if `x` is a UTF-16 codeunit within 0..$FFFF, isn't it also a (BMP) codepoint? – Andreas Rejbrand Apr 27 '21 at 22:52
  • No. because of surrogates in the `$D800..$DFFF` range, which are not valid codepoints. – Remy Lebeau Apr 27 '21 at 22:53
  • I know of these, but I thought they can be called "codepoints" too (albeit useless in their own). – Andreas Rejbrand Apr 27 '21 at 22:54
  • Obviously, you can have integers with values in that range and *say* they represent codepoints, but from the perspective of the Unicode standard, *codepoints* in that range are strictly forbidden. Sure, the range is technically part of the BMP, but it is reserved for UTF-16's exclusive use. – Remy Lebeau Apr 27 '21 at 22:56
  • @RemyLebeau: Okay. – Andreas Rejbrand Apr 27 '21 at 22:58