18

Consider the following code snippet, compiled as a Console Application on MS Visual Studio 2010/2012 and executed on Win7:

#include "stdafx.h"
#include <iostream>
#include <string>


const std::wstring test = L"hello\xf021test!";

int _tmain(int argc, _TCHAR* argv[])
{
    std::wcout << test << std::endl;
    std::wcout << L"This doesn't print either" << std::endl;

    return 0;
}

The first wcout statement outputs "hello" (instead of something like "hello?test!") The second wcout statement outputs nothing.

It's as if 0xf021 (and other?) Unicode characters cause wcout to fail.

This particular Unicode character, 0xf021 (encoded as UTF-16), is part of the "Private Use Area" in the Basic Multilingual Plane. I've noticed that Windows Console applications do not have extensive support for Unicode characters, but typically each character is at least represented by a default character (e.g. "?"), even if there is no support for rendering a particular glyph.

What is causing the wcout stream to choke? Is there a way to reset it after it enters this state?

charunnera
  • 357
  • 4
  • 16

2 Answers2

18

wcout, or to be precise, a wfilebuf instance it uses internally, converts wide characters to narrow characters, then writes those to the file (in your case, to stdout). The conversion is performed by the codecvt facet in the stream's locale; by default, that just does wctomb_s, converting to the system default ANSI codepage, aka CP_ACP.

Apparently, character '\xf021' is not representable in the default codepage configured on your system. So the conversion fails, and failbit is set in the stream. Once failbit is set, all subsequent calls fail immediately.

I do not know of any way to get wcout to successfully print arbitrary Unicode characters to console. wprintf works though, with a little tweak:

#include <fcntl.h>
#include <io.h>
#include <string>

const std::wstring test = L"hello\xf021test!";

int _tmain(int argc, _TCHAR* argv[])
{
  _setmode(_fileno(stdout), _O_U16TEXT);
  wprintf(test.c_str());

  return 0;
}
Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
  • You could try `imbue`ing the stream with a utf16 facet. Dunno if that'll actually work. – Raymond Chen Oct 05 '13 at 14:56
  • 3
    Tried that, couldn't make it work. `wcout` uses the facet to convert the wide string to a sequence of bytes (which for `codecvt_utf16` is basically a no-op), then writes them with `fwrite` one byte at a time, for reasons that escape me. Without `_setmode`, you get one character for every byte, including "unknown character" glyph for zeros. With `_setmode`, you get an assert deep inside `fwrite` complaining that it's asked to write an odd number of bytes. – Igor Tandetnik Oct 05 '13 at 18:52
  • Thank you so much for your fast response, clear explanation, and workaround solution, Igor! I always get lost trying to decipher the iostream (and related) headers so your insights were very helpful! – charunnera Oct 05 '13 at 22:41
  • It seems Microsoft compiler does not know how to print unicode characters even in my current locale (Russian), so it's just printed first ASCII characters then stops. – bialix Oct 01 '14 at 12:38
  • @IgorTandetnik `wcout, or to be precise, (...), *converts wide characters to narrow* characters`, how do you know that? I'm just curious, from where people get underhood knowledge :D. – Sonny D Jul 10 '20 at 22:11
  • Visual Studio comes with the source code for the C++ standard library. You can step through it in the debugger. At least, that was the case in 2013 when this answer was posted. – Igor Tandetnik Jul 11 '20 at 00:22
14

Setting the mode for stdout to _O_U16TEXT will allow you to write Unicode characters to the wcout stream as well as wprintf. (See Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?) This is the right way to make this work.

_setmode(_fileno(stdout), _O_U16TEXT);

std::wcout << L"hello\xf021test!" << std::endl;
std::wcout << L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd" << std::endl;
std::wcout << L"Now this prints!" << std::endl;

It shouldn't be necessary anymore but you can reset a stream that has entered an error state by calling clear:

if (std::wcout.fail())
{
    std::wcout.clear();
}
Community
  • 1
  • 1
Eric MSFT
  • 3,246
  • 1
  • 18
  • 28
  • Unfortunately, this no longer works. An assert `buffer_size %2 == 0` fails because an odd number of characters was written. – Spencer Jan 25 '19 at 14:13