1

I have a python (3.6) code that should pass a Unicode string to an Unreal Engine project. The Unreal Engine displays text in TEXT format which I'm not wrong is an array of Win-Api's TCHARs. I saw that on my platform the TCHAR is 2 bytes.

Here is how I encode the string on the python side:

by = bytes(st, 'utf-8')

I tried encoding and passing the string "Hello". The unreal got the data ['H', 'e', 'l', 'l', 'o'] (each char 1 byte), and printed "效汬o" (it treats the "He" and "ll" as a single Unicode character).

How can I fix this?

  • Should I change the encoding on the python side to always generate 2 bytes per char?
  • Should I decode the result byte array on unreal to TCHAR Unicode somehow?
Elad Weiss
  • 3,662
  • 3
  • 22
  • 50
  • TCHAR is related to the Win32 API, which you should at least mention. Might need to also tag your question "windows"... – martineau Oct 23 '18 at 12:09
  • 2
    `TCHAR` is either ANSI or UTF-16. It depends on compiler defines. Probably what you really mean is that the data is an array of `wchar_t`. – David Heffernan Oct 23 '18 at 12:36
  • How are you passing data to then engine? ctypes? If so, just use the `str` type directly. ctypes knows to pass `str` as UTF-16 to functions using wchar_t. – Mark Tolonen Oct 23 '18 at 12:59
  • @dav: Or ASCII, if neither `_UNICODE` nor `_MBCS` is defined. – IInspectable Oct 23 '18 at 16:22

1 Answers1

3

Given your configuration, TCHAR maps to wchar_t, a character type that is unilaterally encoded using UTF-16LE on Windows.

You can encode the string using:

by = bytes(st, 'utf-16')
IInspectable
  • 46,945
  • 8
  • 85
  • 181
emirc
  • 1,948
  • 1
  • 23
  • 38
  • Works perfectly. So simple. – Elad Weiss Oct 23 '18 at 12:42
  • No, a `TCHAR` isn't UTF-16. The meaning (and fundamental type) depends on external settings, namely whether or not `_UNICODE` or `_MBCS` is defined. Details are explained in the [generic-text mappings](https://learn.microsoft.com/en-us/cpp/c-runtime-library/generic-text-mappings) topic. – IInspectable Oct 23 '18 at 16:19
  • 2
    Sure, I agree. But given the output shown in the question it was clear the settings here are Unicode. – emirc Oct 23 '18 at 16:29