1

As you may remember, windows notepad has encoding ability in "Save As.." function: as ASCII(default), UTF-8, Unicode and Big Endian. I need to make a program, which does smth with text of ASCII .txt file and saves result as Unicode .txt file.

  • As i searched, Unicode here means UTF-16LE (without BOM). If i'm wrong - correct me pls.
  • I tryed to read from ASCII as char and convert it to wchar_t one by one - successfully, but i have UTF-8 instead of UTF-16LE. That's how i do it:

    int result = (int)input_char; //input_chat is char from ASCII 
    while(result<0) result+=256;
    wchar_t output_wchar = wchar_t(result);
    

This code works file and doesn't lose any ASCII symbols.

  • Also i know that UTF-16LE is coded as U+hhhh code. So, if te previous step are right, my problem is: how to put U+hhhh code to wchar_t in c++?
James MV
  • 8,569
  • 17
  • 65
  • 96
shahan
  • 79
  • 7
  • 3
    Using 'Unicode' as the name of an encoding is a sort of Microsoftism that exists throughout Windows, Microsoft's documentation, and anywhere that people learn terminology from those sources. Strictly speaking, Unicode is not an encoding at all. Typically you should use 'UTF-16' instead. Also, UTF-16LE prohibits a BOM, so it's redundant to explicitly say '(without BOM)'. – bames53 Apr 24 '13 at 21:44
  • 1
    May I ask you why do it? Why save anything as UTF-16LE? http://utf8everywhere.org – Pavel Radzivilovsky Apr 25 '13 at 22:38
  • @Pavel Radzivilovsky yes, of course. I found my old ipod nano 3g. It has ability to view .txt files (notes), but each note shouldn't have amount of non-space sybmols more than SOME_AMOUNT (now I don't remember correctly). So, if you want to upload very long note, you should split it to many smaller files. However, if you upload note file from windows, it should be encoded to 'Unicode' with Windows Notepad. Actually, now I don't need this program, but that's the only problem, that I failed to solve in my school times and still can't to solve. – shahan Jul 31 '14 at 16:07

1 Answers1

3

If your source is ASCII and wchar_t has size 2 bytes, and you are in a little-endian system (which I think is a safe guess here), there is really nothing beyond the implicit conversion.

wchar_t output_char = input_char;

Then you can just bit blast the wchar_ts to wherever you want to write them.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510