2

I'm trying to print this vector which also includes unicode characters:

unsigned short RussianStr[] = { 0x044D, 0x044E, 0x044F, 0x0000};

For this reason I cannot use a vector of char but of unsigned short. How do I print all the vector characters? With the printf () function I only see the first character printed

  • possible answer inside comments: https://stackoverflow.com/questions/39576310/how-to-print-the-utf-16-characters-in-c – Tarick Welling Jul 02 '19 at 07:51
  • You should still use unsigned char because your code depends on the endianness of the system. I.e. the resulting bytes might either be `{ 0x04, 0x4D, 0x04, 0x4E, ...` or `{ 0x4D, 0x04, 0x4E, 0x04, ...` – Chris Jul 02 '19 at 07:59
  • @Chris I had also thought of separating the 16 bits into 2 different bytes, but then how can I print the correct character? – Antonino INDEVA Jul 02 '19 at 08:08
  • I was thinking of UTF-8 here, but I see you did not mention if your environment supports that. I think a valid answer would need more information about which environment you need to support. – Chris Jul 02 '19 at 09:12

2 Answers2

3

There are specialized functions and types to deal with Wide characters:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) 
{
    wchar_t RussianStr[] = {0x044D, 0x044E, 0x044F, 0x0000};

    setlocale(LC_ALL, "");
    wprintf(L"%ls\n", RussianStr);
    return 0;
}
David Ranieri
  • 39,972
  • 7
  • 52
  • 94
  • 1
    On my machine (with `LANG=en_US.UTF-8`), the output is: эюя — which looks plausibly Russian. – Jonathan Leffler Jul 02 '19 at 08:09
  • I use C in the microcontroller environment (more specifically Microchip) through the MPLAB X IDE. I'm not sure the wchar.h library exists – Antonino INDEVA Jul 02 '19 at 08:10
  • 2
    In that, case, @AntoninoINDEVA, you're stuck. You either have to find the support or you have to write your own, possibly including the font file for Cyrillic (or a full Unicode font), and the relevant display code, etc. That would be a moderately major undertaking — you probably won't complete it overnight. OTOH, if your embedded environment already supports UTF-8 (and Cyrillic), you can fairly simply write code to convert from UTF-16 to UTF-8 if you can't find any built-in support for the operation. – Jonathan Leffler Jul 02 '19 at 08:11
  • @chux, yes, but take a look to [this explanation](https://www.linux.com/news/programming-wide-characters): You can use `printf` to output wide character strings, but `wprintf` is more appropriate because it handles wide characters natively. For example, the unit for length modifiers is "wide characters" with `wprintf` -- as opposed to bytes with `printf`. – David Ranieri Jul 02 '19 at 09:18
0

The problem is not how to print utf-16 values.... but if your terminal will print utf at all.

If your terminal is utf capable, then you have only to use the wchar_t alternatives to the printf family of functions... and instead of using char, use wchar_t characters. As terminals are normally byte oriented, a conversion from utf-16 to utf-8 will be made by the locale functions and utf-8 chars will be output.

See wprintf(3) and many others.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31