-1

I am trying to write special characters to a file. To be specific something along the 'ă' character, which apparently has the U+0103 Code.

I do not understand how to set the encoding to UNICODE. And how to actually print that character. Everything I tried, including wchar_t only prints '?'.

And if I read from a text file using wchar_t, will it read char by char? Because a normal character in a text file is 1 byte and wchar_t is 2/4. Do I need to read with char and convert?

Some example source code would be apreciated... Thanks in advance!

RatkinHHK
  • 27
  • 1
  • 4
  • 1
    You seem to have some conceptual issues, start here: http://www.joelonsoftware.com/articles/Unicode.html – thebjorn Apr 26 '15 at 09:48
  • To start with something easy: a) stop thinking of "Unicode" as one single encoding (it isn´t), b) don´t believe that wchar_t is the easy solution for everything (in fact, wchar_t isn´t bound to any Unicode encoding in any way. It can be anything.). c) For all three most usual Unicode encodings (UTF8/16/32), wchar_t with 2 bytes isn´t enough to store a character (whatever a character is, because it´s used with multiple meanings) – deviantfan Apr 26 '15 at 10:28
  • @thebjorn --> Thanks for the link. Made a lot of things a lot clearer. So how do I make a program print in a specific encoding. Or if I have a string encoded some way, how do I 'translate' it? – RatkinHHK Apr 26 '15 at 11:24

1 Answers1

1

The terminology you'll need when searching is "encoding" for going from Unicode -> bytes, and "decoding" when going from bytes -> Unicode. In general you must know which encoding the bytes have.

To be able to print to the console, you'll need to encode your Unicode string into the console's encoding. For Linux that is utf-8, while on Windows it could be something unuseful like cp1252 (it is possible to change).

I would suggest looking at Boost.Locale (http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/index.html) or ICU (http://site.icu-project.org/) when working with Unicode in C++ (other languages have more mature/easier to use Unicode functionality in case you're not locked into C++).

thebjorn
  • 26,297
  • 11
  • 96
  • 138