1

I want to convert a string from char* to wchar_t* but that string is any language other than English for example (Russian, Chinese, Arabic, etc). I have tried to do that by the following:

// This is just an example of conversion
const wchar_t * ToWide(const char* mbStr) {
    const size_t cSize = mbstowcs(NULL, mbStr, 0) + 1;
    wchar_t* wc = new wchar_t[cSize];
    mbstowcs(wc, mbStr, cSize);
    return wc;
}

int main() {
    // just the first one is the only that works fine
    wcout << ToWide("Hello");  // (English) The result: Hello
    wcout << ToWide("Привет"); // (Russian) The result: ???????
    wcout << ToWide("你好");    // (Chinese) The result: ??
    wcout << ToWide("مرحبا");   // (Arabic) The result: ع╤═╚╟
}

Why did this happen and how can it be solved or what is the right way to convert from char* to wchar_t*?

Lion King
  • 32,851
  • 25
  • 81
  • 143
  • How do you store non-English strings in a *char* array? – dxiv May 04 '20 at 02:25
  • @dxiv: Do you mean the size of `char` array is not enough for non-English string? – Lion King May 04 '20 at 02:29
  • I mean that a `char` is an 8-bit character in most implementations, which only has room for 256 values, which is certainly not enough to cover all languages. If you want to use some encoding (UTF-8 maybe) then you have to set the appropriate locale, first e.g. the example at https://en.cppreference.com/w/cpp/string/multibyte/mbstowcs. – dxiv May 04 '20 at 02:33
  • 2
    The behaviour of embedding non-basic text in the source code depends on your compiler and environment. To improve portability use unicode escape sequences instead – M.M May 04 '20 at 02:34
  • @dxiv The code in the question already uses `mbstowcs`. C++ supports characters outside the basic source set in source code and runtime data. – aschepler May 04 '20 at 02:35
  • also the behaviour of `wcout` depends on the environment – M.M May 04 '20 at 02:35
  • @dxiv: Yes, I want to use "UTF8" string and convert it to "utf16". – Lion King May 04 '20 at 02:35
  • @aschepler Right, but the narrow side of it is still using the default locale, not necessarily UTF-8. – dxiv May 04 '20 at 02:37
  • 2
    the simplest way to force UTF-8 encoding is to use the `u8` prefix https://en.cppreference.com/w/cpp/language/string_literal – phuclv May 04 '20 at 04:16

0 Answers0