The implementation of wchar_t
under the Windows stdlib is UTF-16-oblivious: it knows only about 16-bit code units.
So you can put a UTF-16 surrogate sequence in a string, and you can choose to treat that as a single character using higher level processing. The string implementation won't do anything to help you, nor to hinder you; it will let you include any sequence of code units in your string, even ones that would be invalid when interpreted as UTF-16.
Many of the higher-level features of Windows do support characters made out of UTF-16 surrogates, which is why you can call a file .txt
and see it both render correctly and edit correctly (taking a single keypress, not two, to move past the character) in programs like Explorer that support complex text layout (typically using Windows's Uniscribe library).
But there are still places where you can see the UTF-16-obliviousness shining through, such as the fact you can create a file called .txt
in the same folder as .txt
, where case-insensitivity would otherwise disallow it, or the fact that you can create [U+DC01][U+D801].txt
programmatically.
This is how pedants can have a nice long and basically meaningless argument about whether Windows “supports” UTF-16 strings or only UCS-2.