7

I've just started doing some Windows programming.

I'm trying to decide how best to handle non-ASCII text.

I'd prefer to use 8-bit characters rather than 16-bit i.e. declare all my strings as char.

I've read the UTF-8 Everywhere proposals, and I think they misrepresent the current state of Windows.

Since Windows 10 version 1803 (10.0.17134.0) support for a UTF-8 page has been implemented to the same standard as other multibyte character encodings.

I think now that I can:

  • Ensure Visual Studio uses UTF-8 to store source code using an EditorConfig file and use UTF-8 strings by specifying '/utf-8' as an "additional" option in the C/C++/Command Line

  • Make sure the system knows the program is using UTF-8 character strings by calling setlocale(LC_ALL,".UTF-8"); and/or setting <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage> in the manifest. (The system will actually expect UTF-8 by default if 'Beta: Use Unicode UTF-8 for worldwide language support' is ticked in Region/Language/Administrative language settings/Region Settings - I believe this sets the active code page to UTF-8, and is the default for Windows 11).

  • Don't define UNICODE and _UNICODE in source, and so use the Win32 'Ansi' interfaces. Windows will convert any text to UTF-16 internally.

  • Use the standard strings and char variables I'm used to, rather than wstring and wchar.

Have I got this right?

Is there anything else I need to do, apart from watching out for any code that in some way depends on a single character being held in a single byte?

Or is there some gotcha that is waiting to trip me?

AndyK
  • 464
  • 5
  • 6
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/242833/discussion-on-question-by-andyk-how-can-i-best-use-utf-8-for-text-in-windows-pro). – Samuel Liew Mar 11 '22 at 01:37

0 Answers0