A try to convert from char* to wchar_t* but doesn't work with non-English languages

Question

I want to convert a string from char* to wchar_t* but that string is any language other than English for example (Russian, Chinese, Arabic, etc). I have tried to do that by the following:

// This is just an example of conversion
const wchar_t * ToWide(const char* mbStr) {
    const size_t cSize = mbstowcs(NULL, mbStr, 0) + 1;
    wchar_t* wc = new wchar_t[cSize];
    mbstowcs(wc, mbStr, cSize);
    return wc;
}

int main() {
    // just the first one is the only that works fine
    wcout << ToWide("Hello");  // (English) The result: Hello
    wcout << ToWide("Привет"); // (Russian) The result: ???????
    wcout << ToWide("你好");    // (Chinese) The result: ??
    wcout << ToWide("مرحبا");   // (Arabic) The result: ع╤═╚╟
}

Why did this happen and how can it be solved or what is the right way to convert from char* to wchar_t*?

@dxiv: Do you mean the size of `char` array is not enough for non-English string? — Lion King, May 04 '20 at 02:29
I mean that a `char` is an 8-bit character in most implementations, which only has room for 256 values, which is certainly not enough to cover all languages. If you want to use some encoding (UTF-8 maybe) then you have to set the appropriate locale, first e.g. the example at https://en.cppreference.com/w/cpp/string/multibyte/mbstowcs. — dxiv, May 04 '20 at 02:33
The behaviour of embedding non-basic text in the source code depends on your compiler and environment. To improve portability use unicode escape sequences instead — M.M, May 04 '20 at 02:34
@dxiv The code in the question already uses `mbstowcs`. C++ supports characters outside the basic source set in source code and runtime data. — aschepler, May 04 '20 at 02:35
@dxiv: Yes, I want to use "UTF8" string and convert it to "utf16". — Lion King, May 04 '20 at 02:35
@aschepler Right, but the narrow side of it is still using the default locale, not necessarily UTF-8. — dxiv, May 04 '20 at 02:37
the simplest way to force UTF-8 encoding is to use the `u8` prefix https://en.cppreference.com/w/cpp/language/string_literal — phuclv, May 04 '20 at 04:16

A try to convert from char* to wchar_t* but doesn't work with non-English languages

0 Answers0