General question
Is there a possibility to avoid character set conversion when writing to std::cout
/ std::cerr
?
I do something like
std::cout << "Ȋ'ɱ ȁ ȖȚƑ-8 Șțȓȉɳɠ (in UTF-8 encoding)" << std::endl;
And I want the output to be written to the console maintaining the UTF-8 encoding (my console uses UTF-8 encoding, but my C++ Standard Library, GNUs libstdc++
, doesn't think so for some reason).
If there's no possibility to forbid character encoding conversion: Can I set std::cout
to use UTF-8, so it hopefully figures out itself that no conversion is needed?
Background
I used the Windows API function SetConsoleOutputCP(CP_UTF8);
to set my console's encoding to UTF-8.
The problem seems to be that UTF-8 does not match the code page typicallly used for my system's locale and libstdc++
therefore sets up std::cout
with the default ANSI code page instead of correctly recognizing the switch.
Edit: Turns out I misinterpreted the issue and the solution is actually a lot simpler (or not...).
The "Ȋ'ɱ ȁ ȖȚƑ-8 Șțȓȉɳɠ (in UTF-8 encoding)"
was just meant as a placeholder (and I shouldn't have used it as it has hidden the actual issue).
In my real code the "UTF-8 string" is a Glib::ustring
, and those are by definition UTF-8 encoded.
However I did not realize that the output operator <<
was defined in glibmm in a way that forces character set conversion.
It uses g_locale_from_utf8()
internally which in turn uses g_get_charset()
to determine the target encoding.
Unfortunately the documentation for g_get_charset()
states
On Windows the character set returned by this function is the so-called system default ANSI code-page. That is the character set used by the "narrow" versions of C library and Win32 functions that handle file names. It might be different from the character set used by the C library's current locale.
which simply means that glib will neither care for the C locale I set nor will it attempt to determine the encoding my console actually uses and basically makes it impossible to use many glib functions to create UTF-8 output. (As a matter of fact this also means that this issue has the exact same cause as the issue that triggered my other question: Force UTF-8 encoding in glib's "g_print()").
I'm currently considering this a bug in glib (or a serious limitation at best) and will probably open a report in the issue tracker for it.