1

I'm using a library that produces UTF-8 null-terminated strings in the const char* type. Examples include:

MIGUEL ANTÓNIO
DONA ESTEFÂNIA

I'd like to convert those two const char* types to CString so that they read:

MIGUEL ANTÓNIO
DONA ESTEFÂNIA

To that effect, I'm using the following function I made:

CString Utf8StringToCString(const char * s)
{
    CStringW ws = CA2W(s, CP_UTF8);
    return CW2T(ws);
}

The function seems to do what I want (at least for those 2 cases). However, I'm wondering: is it a good idea at all to use the CA2W macro, followed by CW2T? Am I doing some sort of lossy conversion by doing this? Are there any side-effects I should worry about?

Some other details:

  1. I'm using Visual Studio 2015
  2. My application is compiled using Use Multi-Byte Character Set
  • CW2T will lose any Unicode characters that can't be represented with your code page. – john Mar 13 '19 at 10:07
  • Why are your project settings still on MBCS? Do you still have customers on Windows 98? – selbie Mar 13 '19 at 10:45
  • @selbie Um... Honestly, I don't know. All of our customers use Windows 8, but I believe this is a legacy application. From what I understand, changing the setting to Unicode might break existing code... – Miguel Lopes Martins Mar 13 '19 at 11:20
  • @MiguelLopesMartins - I figured as much. It's easy to work around though. See my answer below. – selbie Mar 13 '19 at 11:22

1 Answers1

1

Even if your application is compiled as MBCS, you can still use Unicode strings, buffers, and Windows Unicode APIs without any issue.

Pass your strings around as UTF-8 either with a raw pointer (const char*) or in a string class such as CString or std::string. When you actually need to render the string for display, convert to Unicode and use the W API explicitly.

For example:

void UpdateDisplayText(const char* s)
{
    CStringW ws = CA2W(s, CP_UTF8);
    SetDlgItemTextW(m_hWnd, IDC_LABEL1, ws);
}
selbie
  • 100,020
  • 15
  • 103
  • 173