2

Is it possible to imbue a std::fstream so that a std::string containing UTF-8 encoding can be streamed to an UTF-16 file?

I tried the following using the utf8-to-utf16 facet, but the result file is still UTF-8:

std::fstream utf16_stream("test.txt", std::ios_base::trunc | std::ios_base::out);
utf16_stream.imbue(std::locale(std::locale(), new codecvt_utf8_utf16<wchar_t, 
                               std::codecvt_mode(std::generate_header | std::little_endian)>);

std::string utf8_string = "\x54\\xE2\x83\xac\x73\x74";

utf16_stream << utf8_string;

References for the codecvt_utf8_utf16 facet seem to indicate it can be used to read and write UTF-8 files, not UTF-16 - is that correct, and if so, is there a simple way to do what I want to do?

Joris Timmermans
  • 10,814
  • 2
  • 49
  • 75
  • 1
    Using UTF-8 internally and UTF-16 externally is perverse. If UTF-16 ever makes sense it's as an internal encoding to simplify using all those 90s APIs that were misguided enough to use UTF-16 natively. – bames53 Jul 17 '13 at 18:27
  • @bames53 - the requirement is for compatibility with Windows applications that unfortunately read and write "UTF16LE" files (though considering the number of difficulties I've met the past weeks, they might actually be some kind of Microsoft UCS2-ish abomination). – Joris Timmermans Jul 18 '13 at 06:53

1 Answers1

4

file streams (by virtue of the requirements of std::basic_filebuf §22.4.1.4.2[locale.codecvt.virtuals]/3) do not support N:M character encoding conversions as would be the case with UTF8 internal / UTF16 external.

You'd have to build a UTF-16 string, e.g. by using wstring_convert, reinterpret it as a sequence of bytes, and output it using usual (non-converting) std::ofstream.

Or, alternatively, convert UTF-8 to wide first, and then use std::codecvt_utf16 which produces UTF-16 as a sequence of bytes, and therefore, can be used with file streams.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • I guess that also means I can't use the codecvt_mode flags to ensure the endianness and the BOM on output - I'll have to write those myself too? – Joris Timmermans Jul 17 '13 at 14:48
  • @MadKeithV Right.. but you can use them if you're going utf-8 -> wide -> utf-16 (see edit) – Cubbi Jul 17 '13 at 15:25