1

I am making an OpenVG application for Raspberry Pi that displays some text and I need a support for foreign characters (Polish in this case). I plan to prepare a function that maps unicode characters to literals in C in some higher level language but for now there's a problem with printing those literals in C.

Given the code below:

//both output the "ó" character, as expected 
char     A[] =  "\xF3"; 
wchar_t  B[] = L"\xF3"; 

//"ś" is expected as output but instead I get character with code 0x5B - "[" 
char     A[] =  "\x15B"; 
wchar_t  B[] = L"\x15B"; 

Most of Polish characters have 3-digit hexadecimal codes. When I attempt to print "ś" (0x15B), it prints character "[" (0x5B) instead. It turns out I cannot print any unicode characters with more than 2-digit codes.

Is used data type the cause? I have considered using char16_t and char32_t but the header files are nowhere to be found in the system.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Paweł Duda
  • 1,713
  • 4
  • 18
  • 36
  • If until now all char-strings were printed correctly, it won´t switch to UTF16/32 just because you think so. Maybe it can work with UTF8, but then you´ve the wrong values (and wrong assignment code) – deviantfan Jan 15 '15 at 16:02
  • It might have something to do with normal characters only being a single byte on almost all platforms, so the character `'\x15b'` can not be represented in the string. – Some programmer dude Jan 15 '15 at 16:02
  • Does `"\u015b"` give the desired results? – Wintermute Jan 15 '15 at 16:02
  • @Wintermute No, the output is still "[", so the amount of leading 0's doesn't seem to matter. – Paweł Duda Jan 15 '15 at 16:05
  • 2
    It's not just a leading `0`, it's the `\u` instead of `\x`. – Wintermute Jan 15 '15 at 16:06
  • 2
    Rather than say "When I attempt to print ś", post the _code_ that does the print. – chux - Reinstate Monica Jan 15 '15 at 16:10
  • As a sanity-check, you might want to print the size of your arrays and compare with what's expected. – Deduplicator Jan 15 '15 at 16:14
  • @Wintermute My bad. It does give a different result. Now it prints some accented A letter with a rectangle next to it. So the result is the same as if I did char A[] = "ś". I can upload photo of the said character if you want. – Paweł Duda Jan 15 '15 at 16:14
  • That's probably the UTF-8 sequence for ś as rendered through your shell's encoding. I'm afraid you've arrived in encoding hell -- the string is going to be a compile-time constant, and your compiler and your shell disagree about what the proper encoding for the string is. What locale do you use? If it's only for your own local use, you could just change it to one that uses UTF-8 and not care, or you could put ś into the string and use `recode` to convert the file to an encoding you like. Otherwise, you'll have to convert it at runtime (with iconv or similar). – Wintermute Jan 15 '15 at 16:56
  • 1
    The problem might be outside your program entirely. When you print, you just send data to stdout. What happens with it then? Where is it printed? Which character set does the shell (or calling application, whatever that might be) assume? which font does it use, and which Unicode code points are defined in that font? If you want to know whether or not your program outputs the expected data, write it to a file instead. Then open the file in a proper editor and check its contents. – jalf Jan 15 '15 at 16:59
  • As a minimum, call `setlocale( LC_ALL, "" )` at start of `main`. – Cheers and hth. - Alf Jan 15 '15 at 17:23

1 Answers1

1

It's what in this

char A[]={'\xc5','\x9b'};

c59b is "ś" (0x15B) by UTF-8.

deviantfan
  • 11,268
  • 3
  • 32
  • 49