Visual C++ - UTF-8 - CA2W followed by CW2T with MBCS - Possibly a bad idea?

Question

I'm using a library that produces UTF-8 null-terminated strings in the const char* type. Examples include:

MIGUEL ANTÃ“NIO
DONA ESTEFÃ‚NIA

I'd like to convert those two const char* types to CString so that they read:

MIGUEL ANTÓNIO
DONA ESTEFÂNIA

To that effect, I'm using the following function I made:

CString Utf8StringToCString(const char * s)
{
    CStringW ws = CA2W(s, CP_UTF8);
    return CW2T(ws);
}

The function seems to do what I want (at least for those 2 cases). However, I'm wondering: is it a good idea at all to use the CA2W macro, followed by CW2T? Am I doing some sort of lossy conversion by doing this? Are there any side-effects I should worry about?

Some other details:

I'm using Visual Studio 2015
My application is compiled using Use Multi-Byte Character Set

CW2T will lose any Unicode characters that can't be represented with your code page. — john, Mar 13 '19 at 10:07
Why are your project settings still on MBCS? Do you still have customers on Windows 98? — selbie, Mar 13 '19 at 10:45
@selbie Um... Honestly, I don't know. All of our customers use Windows 8, but I believe this is a legacy application. From what I understand, changing the setting to Unicode might break existing code... — Miguel Lopes Martins, Mar 13 '19 at 11:20
@MiguelLopesMartins - I figured as much. It's easy to work around though. See my answer below. — selbie, Mar 13 '19 at 11:22

score 1 · Answer 1 · answered Mar 13 '19 at 10:53

Even if your application is compiled as MBCS, you can still use Unicode strings, buffers, and Windows Unicode APIs without any issue.

Pass your strings around as UTF-8 either with a raw pointer (const char*) or in a string class such as CString or std::string. When you actually need to render the string for display, convert to Unicode and use the W API explicitly.

For example:

void UpdateDisplayText(const char* s)
{
    CStringW ws = CA2W(s, CP_UTF8);
    SetDlgItemTextW(m_hWnd, IDC_LABEL1, ws);
}

Visual C++ - UTF-8 - CA2W followed by CW2T with MBCS - Possibly a bad idea?

1 Answers1