1

I have a function to take a std::string and change it into a wchar_t*. My current widen function looks like this

wchar_t* widen(const std::string& str){  
    wchar_t * dest = new wchar_t[str.size()+1];
    char * temp = new char[str.size()];
    for(int i=0;i<str.size();i++)
        dest[i] = str[i];
    dest[str.size()] = '\0';
    return dest;
}  

This works just fine for standard characters, however (and I cannot believe this hasn't been an issue before now) when I have characters like á, é, í, ó, ú, ñ, or ü it breaks and the results are vastly different.
Ex: my str comes in as "Database Function: áFákéFúnctíóñü"
But dest ends up as: "Database Function: £F£k←Fnct■￳￱"

How can I change from a std::string to a wchar_t* while maintaining international characters?

WWZee
  • 494
  • 2
  • 8
  • 23
  • 2
    Just a note on your code: In most cases, when you're using manual memory management (i.e. not using smart pointers) or when you are using the vector `new`, you are doing something wrong. Use standard containers (vector, string, wstring) here. – Ulrich Eckhardt Jul 12 '18 at 20:09
  • @UlrichEckhardt, appreciate the note. This particular project is between 10-14 years old in most places, I've only been blessed to wade through it for the past year or so, and even more unfortunately, due to backwards compatibility necessity I can't fix old stuff that is broken – WWZee Jul 13 '18 at 15:55

3 Answers3

3

Short answer: You can't.

Longer answer: std::string contains char elements which typically contain ASCII in the first 127 values, while everything else ("international characters") is in the values above (or the negative ones, if char is signed). In order to determine the according representation in a wchar_t string, you first need to know the encoding in the source string (could be ISO-8859-15 or even UTF-8) and the one in the target string (often UTF-16, UCS2 or UTF-32) and then transcode accordingly.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • Yes, "know the encoding in the source string…and the one in the target string…and then transcode accordingly." This makes any similarity to ASCII—and even the existence of ASCII—irrelevant. – Tom Blodget Jul 13 '18 at 00:07
1

It depends if the source is using old ANSI code page or UTF8. For ANSI code page, you have to know the locale, and use mbstowcs. For UTF8 you can make a conversion to UTF16 using codecvt_utf8_utf16. However codecvt_utf8_utf16 is deprecated and it has no replacement as of yet. In Windows you can use WinAPI function to make the conversions more reliably.

#include <iostream>
#include <string>
#include <codecvt>

std::wstring widen(const std::string& src)
{
    int len = src.size();
    std::wstring dst(len + 1, 0);
    mbstowcs(&dst[0], src.c_str(), len);
    return dst;
}

int main()
{
    //ANSI code page?
    std::string src = "áFákéFúnctíóñü";
    setlocale(LC_ALL, "en"); //English assumed
    std::wstring dst = widen(src);
    std::wcout << dst << "\n";

    //UTF8?
    src = u8"áFákéFúnctíóñü";
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
    dst = convert.from_bytes(src);
    std::wcout << dst << "\n";

    return 0;
}
Barmak Shemirani
  • 30,904
  • 6
  • 40
  • 77
0

For a Windows solution, here's some utility functions I use based on the wisdom of http://utf8everywhere.org/

/// Convert a windows UTF-16 string to a UTF-8 string
///
/// @param s[in] the UTF-16 string
/// @return std::string UTF-8 string
inline std::string Narrow(std::wstring_view wstr) {
  if (wstr.empty()) return {};
  int len = ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), nullptr, 0,
                                  nullptr, nullptr);
  std::string out(len, 0);
  ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), &out[0], len,
                        nullptr, nullptr);
  return out;
}

/// Convert a UTF-8 string to a windows UTF-16 string
///
/// @param s[in] the UTF-8 string
/// @param n[in] the UTF-8 string's length, or -1 if string is null-terminated
/// @return std::wstring UTF-16 string
inline std::wstring Widen(std::string_view str) {
  if (str.empty()) return {};
  int len = ::MultiByteToWideChar(CP_UTF8, 0, &str[0], str.size(), NULL, 0);
  std::wstring out(len, 0);
  ::MultiByteToWideChar(CP_UTF8, 0, &str[0], str.size(), &out[0], len);
  return out;
}

Usually used inline in windows API calls like:

std::string message = "Hello world!";
::MessageBoxW(NULL, Widen(message).c_str(), L"Title", MB_OK);

A cross-platform and possibly faster solution could be found by exploring Boost.Nowide's conversion functions: https://github.com/boostorg/nowide/blob/develop/include/boost/nowide/utf/convert.hpp

MHebes
  • 2,290
  • 1
  • 16
  • 29