I've just started doing some Windows programming.
I'm trying to decide how best to handle non-ASCII text.
I'd prefer to use 8-bit characters rather than 16-bit i.e. declare all my strings as char.
I've read the UTF-8 Everywhere proposals, and I think they misrepresent the current state of Windows.
Since Windows 10 version 1803 (10.0.17134.0) support for a UTF-8 page has been implemented to the same standard as other multibyte character encodings.
I think now that I can:
Ensure Visual Studio uses UTF-8 to store source code using an EditorConfig file and use UTF-8 strings by specifying '/utf-8' as an "additional" option in the C/C++/Command Line
Make sure the system knows the program is using UTF-8 character strings by calling
setlocale(LC_ALL,".UTF-8");
and/or setting<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
in the manifest. (The system will actually expect UTF-8 by default if 'Beta: Use Unicode UTF-8 for worldwide language support' is ticked in Region/Language/Administrative language settings/Region Settings - I believe this sets the active code page to UTF-8, and is the default for Windows 11).Don't define
UNICODE
and_UNICODE
in source, and so use the Win32 'Ansi' interfaces. Windows will convert any text to UTF-16 internally.Use the standard strings and char variables I'm used to, rather than wstring and wchar.
Have I got this right?
Is there anything else I need to do, apart from watching out for any code that in some way depends on a single character being held in a single byte?
Or is there some gotcha that is waiting to trip me?