4

There are four kind of encoding options when creating a .txt file in Windows.

  • ANSI
  • UNICODE(litte endian)
  • UNICODE(big endian)
  • UTF-8

Text File Encoding Option

C standard library supports this option, by using FILE.

C STL

FILE* file;
file = _wfopen(L"test.txt", L"wt+,ccs=UTF-16LE");

It has been working great, but I found there is no parameter for this in std::ofstream.

wofstream myfile;
myfile.open("example.txt", ?????????);

So, I want to know how to create files like this in C++. Is there any solution for this in C++ STL?

hichris123
  • 10,145
  • 15
  • 56
  • 70
drek
  • 41
  • 2
  • 5
  • The term `STL` is used to refer only to those parts of the Standard Library that deals with *iterators* and *containers* and *algorithms*, not files. **see:** https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-glossary – Galik Mar 01 '16 at 09:31
  • [What's this STL vs C++ Standard Library fight all about?](http://stackoverflow.com/questions/5205491/whats-this-stl-vs-c-standard-library-fight-all-about) – Bo Persson Mar 01 '16 at 09:38
  • 1
    The C standard library doesn't support anything like that. Your Windows C runtime library that comes with MSVC does. There's absolutely nothing standard about it. – n. m. could be an AI Mar 01 '16 at 09:47
  • 1
    http://www.cplusplus.com/reference/codecvt/codecvt_utf16/ – Hans Passant Mar 01 '16 at 09:55
  • I'm talking about the file saving option, not the contents in txt file. – drek Mar 01 '16 at 10:11
  • Please use a tag in this format: @ee_do to answer a specific user's comment. The user will get a message, and everyone will understand the flow of the conversation. – n. m. could be an AI Mar 01 '16 at 10:37

2 Answers2

3

Starting with C++11, the standard C++ library allows to generate UTF16 text files with the following steps:

  • build a locale using the C++11 class std::codecvt_utf16 - you can specify endianness in constructor
  • open a file using a std::wofstream in which you will write unicode text
  • just imbue the locale into the wide stream and start writing, optionnaly starting with a Byte Order Mark character (U+FEFF)

Here is an example adapted from the page referenced by @HansPassant in its comment:

// codecvt_utf16: writing unicode string as UTF-16
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
#include <fstream>

int main ()
{
  std::wstring str ( { 0xa8, 0xa9 });

  std::locale loc (std::locale(), new std::codecvt_utf16<wchar_t>);
  std::basic_ofstream<wchar_t> ofs ("test.txt");
  ofs.imbue(loc);

  std::cout << "Writing to file (UTF-16)... ";
  ofs << (wchar_t) 0xfeff; // BOM
  ofs << str;
  std::cout << "done!\n";

  return 0;
}

You get an utf16 file starting with a little endian BOM and containing èé

(hexadecimal dump:

$ od -xc test.txt
0000000      fffe    a800    a900
         376 377  \0 250  \0 251

)

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Credits should remain to @HansPassant that gave the reference in its comment – Serge Ballesta Mar 01 '16 at 11:37
  • If you use `new std::codecvt_utf16` instead, you don't have to write a BOM manually, one will be written automatically. – Remy Lebeau Mar 02 '16 at 02:16
  • In any case, if you are not using C++11, you can still use `std::wofstream` to produce UTF-16 encoded files, you don't really need to imbue a locale to accomplish that. On Windows, writing a `wchar_t` to a `std::wofstream` outputs UTF-16 encoded data. That being said, `imbue()` and `std::locale` are available in pre-C++11 versions. It is `std::codecvt_utf16` that was added in C++11. You can create and imbue UTF-16 locales in earlier versions. – Remy Lebeau Mar 02 '16 at 02:17
  • @Remy Lebeau Thank you for the clear explanations. You,re saying using std::wofstream and a wchar_t is enough to create unicode encoded data. That is true, but the data means contents in the file, the file itself is still ANSI encoded. So, can you show me how to accomplish that without std::codecvt_utf16? – drek Mar 02 '16 at 03:22
  • @Remy Lebeau and If I use "generate_header", it creates a big endian UTF-16 encoded file. To produce a little endian file, I have to use "little_endian" instead of "generate_header". – drek Mar 02 '16 at 04:05
  • @ee_do: You can use both via the OR (`|`) operator: `new std::codecvt_utf16` – Remy Lebeau Mar 02 '16 at 04:17
  • "can still use std::wofstream to produce UTF-16 encoded files" The standard says nothing about how wofstreams are encoded. "On Windows, writing a wchar_t to a std::wofstream outputs UTF-16 encoded data" Not with commonly available compilers like MSVC or GCC (unless you imbue a locale with appropriate codecvt). – n. m. could be an AI Mar 02 '16 at 12:22
1

There is no "C STL". STL stands for Standard Template Library. C does not have templates. You may be referring to the C standard library and C++ standard library.

The C standard library has no functions for "creating unicode" or converting text to or from unicode. There is no _wfopen in the C standard libray. You're using a function from the Microsoft C Run-Time Library.

The C++ library does have an API to convert between (UTF-8 and UTF-16) and (UTF-8 and UTF-32) and (system native wide and system native multibyte) encodings: http://en.cppreference.com/w/cpp/locale/codecvt

There is hardly any other support for unicode in the standard library. You must take care that the string that you're writing is in the encoding that you want it to be and you must explicitly write a BOM if you need to.

eerorika
  • 232,697
  • 12
  • 197
  • 326