6
int main() 
{
    char c = 0x41;
    printf("char is : %c\n",c);

    c = 0xe9;
    printf("char is : %c\n",c);

    unsigned int d = 0x164e;
    printf("char is : %c\n",d);


    return 0;
}

What I want to print out are:

enter image description here

I use Ubuntu 64-bit VMware Workstation on windows and use Octal dump:

enter image description here

The hexadecimal value of the three chars from an utf-16 LE txt file.

The output:

enter image description here

How to print out utf-16 characters correctly?

Patrick
  • 293
  • 1
  • 5
  • 14
  • 1
    Use `wchar_t` and read e.g. [this `printf` (and family) reference](http://en.cppreference.com/w/c/io/fprintf) to check the modifiers you can have for printing wide characters. – Some programmer dude Sep 19 '16 at 15:02
  • As for why your code prints `'N'`, check e.g. [this ASCII table](http://en.cppreference.com/w/c/language/ascii) and look what the hexadecimal value `0x4e` (the low bytte in your integer) corresponds to. Then think about that for a while to try and figure out why the `'N'` is printed. – Some programmer dude Sep 19 '16 at 15:04
  • Related: [char vs wchar_t vs char16_t vs char32_t (c++11)](http://stackoverflow.com/q/19068748/2402272) – John Bollinger Sep 19 '16 at 15:19
  • The 0xFF and 0xFE bytes are the byte order mark. – Jonathan Leffler Sep 19 '16 at 15:25
  • @JoachimPileborg: `wchar_t` is 32 bits on Linux (not sure about the encoding,though), UTF-16 is a variable length code. – too honest for this site Sep 19 '16 at 15:50
  • 3
    Your question is not clear. Do you mean the **encoding** UTF-16 or the glyphs you show in the images, however they shall be encoded? Linux systems like most other sane OS normally uses UTF-8, not UTF-16. Anyway, variable-length character encoding is beyond standard C. Maybe there is a library, but asking for a library is off-topic here. – too honest for this site Sep 19 '16 at 15:51
  • Question 1: Can your `stdout` display `世`? I see it in the first pic, but is that due to `stdout`? – chux - Reinstate Monica Sep 19 '16 at 15:52
  • What happens if you use `%lc` (ell cee) instead of `%c`? – Ian Abbott Sep 19 '16 at 15:59
  • 1
    The octal dump you gave contains the UTF-16 code 0x4e16, not 0x164e as in your C code. – Ian Abbott Sep 19 '16 at 16:09
  • 2
    If your Linux terminal session is set to a suitable UTF-8 locale, and can display UTF-8 sequences correctly, then all your C program needs to do is call `setlocale(LC_CTYPE, "");` (declared by `#include `) and use the format specifier `%lc` to output a character contained in a corresponding parameter of type `wint_t` (declared by `#include `) as a multi-byte UTF-8 sequence. If redirecting to a file, the redirected output will be encoded as UTF-8. If you want the output to be encoded as UTF-16, you will have to do something else. – Ian Abbott Sep 19 '16 at 17:00

0 Answers0