I haven't used gcc in a mingw environment on Windows, but from what I gather it doesn't support C++ locales.
Since it doesn't support C++ locales this isn't really relevant, but FYI, Windows doesn't use the same locale naming scheme as most other platforms. They use a similar language_country.encoding, but the language and country are not codes, and the encoding is a Windows code page number. So the locale would be "English_United States.65001", however this is not a supported combination (code page 65001 (UTF-8) isn't supported as part of any locale).
The reason that only ws1
prints, and only once is that when the character \u20AC
is printed, the stream fails and the fail bit is set. You have to clear the error before anything further will be printed.
C++11 introduced some things that will portably deal with UTF-8, but not everything is supported yet, and the additions don't completely solve the problem. But here's the way things currently stand:
When char16_t
and char32_t
are supported in VS as native types rather than typedefs you will be able to use the standard codecvt facet specializations codecvt<char16_t,char,mbstate_t>
and codecvt<char32_t,char,mbstate_t>
which are required to convert between UTF-16 or UTF-32 respectively, and UTF-8 (rather than the execution charset or system encoding). This doesn't work yet because in the current VS (and in VS11DP) these types are only typedefs and template specializations don't work on typedefs, but the code is already in the headers in VS 2010, just protected behind an #ifdef
.
The standard also defines some special purpose codecvt facet templates which are supported, codecvt_utf8, and codecvt_utf8_utf16. The former converts between UTF-8 and either UCS-2 or UCS-4 depending on the size of the wide char type you use, and the latter converts between UTF-8 and UTF-16 code units independent of the size of the wide char type.
std::wcout.imbue(std::locale(std::locale::classic(),new std::codecvt_utf8_utf16<wchar_t>()));
std::wcout << L"ØÀéîðüýþ\n";
This will output UTF-8 code units through whatever is attached to wcout. If output has been redirected to file then opening it will show a UTF-8 encoded file. However, because of the console model on Windows, and the way the standard streams are implemented, you will not get correct display of Unicode characters in the command prompt this way (even if you set the console output code page to UTF-8 with SetConsoleOutputCP(CP_UTF8)
). The UTF-8 code units are output one at a time, and the console will look at each individual chunk passed to it expecting each chunk (i.e. single byte in this case) passed to be complete and valid encodings. Incomplete or invalid sequences in the chunk (every byte of all multibyte character representations in this case) will be replaced with U+FFFD when the string is displayed.
If instead of using iostreams you use the C function puts
to write out an entire UTF-8 encoded string (and if the console output code page is correctly set) then you can print a UTF-8 string and have it displayed in the console. The same codecvt facets can be used with some other C++11 convinence classes to do this:
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
puts(convert(L"ØÀéîðüýþ\n).to_bytes().c_str());
The above is still not quite portable, because it assumes that wchar_t is UTF-16, which is the case on Windows but not on most other platforms, and it is not required by the standard. (In fact my understanding is that it's not technically conforming because UTF-16 needs multiple code units to represent some characters and the standard requires that all characters in the chosen encoding must be representable in a single wchar_t).
std::wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> convert;
The above will portably handle UCS-4 and USC-2, but won't work outside the Basic Multilingual Plane on platforms using UTF-16.
You could use the conditional
type trait to select between these two facets based on the size of wchar_t
and get something that mostly works:
std::wstring_convert<
std::conditional<sizeof(wchar_t)==2,std::codecvt_utf8_utf16<wchar_t>,
std::codecvt_utf8<wchar_t>
>::type,
wchar_t
> convert;
Or just use preprocessor macros to define an appropriate typedef, if your coding standards allow macros.