How to print 2-byte unicode characters

Question

I'm trying to print this vector which also includes unicode characters:

unsigned short RussianStr[] = { 0x044D, 0x044E, 0x044F, 0x0000};

For this reason I cannot use a vector of char but of unsigned short. How do I print all the vector characters? With the printf () function I only see the first character printed

possible answer inside comments: https://stackoverflow.com/questions/39576310/how-to-print-the-utf-16-characters-in-c — Tarick Welling, Jul 02 '19 at 07:51
You should still use unsigned char because your code depends on the endianness of the system. I.e. the resulting bytes might either be `{ 0x04, 0x4D, 0x04, 0x4E, ...` or `{ 0x4D, 0x04, 0x4E, 0x04, ...` — Chris, Jul 02 '19 at 07:59
@Chris I had also thought of separating the 16 bits into 2 different bytes, but then how can I print the correct character? — Antonino INDEVA, Jul 02 '19 at 08:08
I was thinking of UTF-8 here, but I see you did not mention if your environment supports that. I think a valid answer would need more information about which environment you need to support. — Chris, Jul 02 '19 at 09:12

score 3 · Answer 1 · answered Jul 02 '19 at 07:56

3

There are specialized functions and types to deal with Wide characters:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) 
{
    wchar_t RussianStr[] = {0x044D, 0x044E, 0x044F, 0x0000};

    setlocale(LC_ALL, "");
    wprintf(L"%ls\n", RussianStr);
    return 0;
}

answered Jul 02 '19 at 07:56

David Ranieri

39,972
7
52
94

1

On my machine (with `LANG=en_US.UTF-8`), the output is: эюя — which looks plausibly Russian. – Jonathan Leffler Jul 02 '19 at 08:09
I use C in the microcontroller environment (more specifically Microchip) through the MPLAB X IDE. I'm not sure the wchar.h library exists – Antonino INDEVA Jul 02 '19 at 08:10
2

In that, case, @AntoninoINDEVA, you're stuck. You either have to find the support or you have to write your own, possibly including the font file for Cyrillic (or a full Unicode font), and the relevant display code, etc. That would be a moderately major undertaking — you probably won't complete it overnight. OTOH, if your embedded environment already supports UTF-8 (and Cyrillic), you can fairly simply write code to convert from UTF-16 to UTF-8 if you can't find any built-in support for the operation. – Jonathan Leffler Jul 02 '19 at 08:11
@chux, yes, but take a look to [this explanation](https://www.linux.com/news/programming-wide-characters): You can use `printf` to output wide character strings, but `wprintf` is more appropriate because it handles wide characters natively. For example, the unit for length modifiers is "wide characters" with `wprintf` -- as opposed to bytes with `printf`. – David Ranieri Jul 02 '19 at 09:18

score 0 · Answer 2 · answered Jul 03 '19 at 13:18

The problem is not how to print utf-16 values.... but if your terminal will print utf at all.

If your terminal is utf capable, then you have only to use the wchar_t alternatives to the printf family of functions... and instead of using char, use wchar_t characters. As terminals are normally byte oriented, a conversion from utf-16 to utf-8 will be made by the locale functions and utf-8 chars will be output.

See wprintf(3) and many others.

How to print 2-byte unicode characters

2 Answers2