2

I am trying to render characters using the TTF_RenderUTF8_Blended method provided by the SDL_ttf library. I implemented user input (keyboard) and pressing 'ä' or 'ß' for example works fine. These are special characters of the German language. In this case, they are even in the extended ASCII 8-bit code, but even when I copy and paste some Greek letters for example, the fonts get rendered correctly using UTF8. (However not all the UNICODE glyphs you can find here (http://unicode-table.com/) am I able to render as I recognized during testing but I guess that is normal because the Arial font might not have every single glyph. Anyways most of the UNICODE glyphs work fine.)

My problem is that passing strings (parameter as const char*) the additional characters (to ASCII) aren't rendered correctly. So entering 'Ä', 'ß', or some other UNICODE chars with the keyboard at runtime works but passing them as a parameter to get - let's say a title for my game - inside the code like this does not work:

font_srf = TTF_RenderUTF8_Blended(font, "Hällö", font_clr);

I don't really understand why this is happening. What I get on the screen is:

H_ll_
And I am using _ to represent the typical vertical rectangle that the guy who gave the following speech used as a funny way of an introduction: https://www.youtube.com/watch?v=MW884pluTw8

Ironically, when I use TTF_RenderText_Blended(font, "Hällö", font_clr); it works because 'ä' and 'ö' are 8-bit extended ASCII encoded, but what I want is UNICODE support, so that does not help.

Edit & Semi-Solution

I kind of (not really good) fixed the problem, Because my input works fine, I just checked what values I get as input when I press 'ä', 'ß', ... on my keyboard using the following code:

const char* c = input.c_str();

for (int i = 0; i < input.length(); i++)
{
    std::cout << int(c[i]) << " ";
}

Then I printed those characters in the following way:

const char char_array[] = {-61, -74, -61, -97, '\0'};
const char* char_pointer = char_array;

-61, -74 is 'ö' and -61, -97 is 'ß'. This does fit the UTF8 encoding right?

  • U+00F6 | ö | C3 B6 (from UTF8 data table)
  • 256-61=195 which is C3
  • and 256-74=182 which is B6

    const char char_array[] = {0xC3, 0xB6};

This code works fine as well in case some of you were wondering. And I think this is what I will keep doing for now. Looking up the Hex-code for some Unicode glyphs isn't that hard.

But what I still can't figure out is how to get to the extended ASCII integer value of 246. Plus, isn't there a more human-friendly solution to my problem?

huzzm
  • 489
  • 9
  • 24
  • 1
    Do you have a C++11 compiler? Might try a [`u8` string literal](http://en.cppreference.com/w/cpp/language/string_literal). – genpfault Jun 02 '16 at 15:51
  • I am not sure but I don't think so because my compiler does not allow u8 as prefix. (How do I figure that out though?). As IDE, I use Microsoft Visual Studio 2012 Express. – huzzm Jun 03 '16 at 20:57

1 Answers1

2

If you have non-ASCII characters in a source file, the character encoding of that source code file matters. So in your text editor or IDE, you need to set the character set (e.g. UTF-8) when you save it.

Alternatively, you can use the \x... or \u.... format to specify non-ASCII characters using only ASCII characters, so source file encoding doesn't matter.

Microsoft doc, but not MS-specific:

https://msdn.microsoft.com/en-us/library/6aw8xdf2.aspx

Peter Stock
  • 440
  • 3
  • 13
  • When I had some Unicode glyphs in my source file the IDE asked me "Do you want to resave this file as Unicode in order to maintain your data?", and I said yes, so my source code should be encoded correctly or am I getting this wrong? – huzzm Jun 01 '16 at 21:42
  • Try with using the \u.... char constants in your strings instead. Then you know exactly what you're passing. If that works, investigate source file encoding - you might not be passing what you think you are. Use a debugger and breakpoint to examine the bytes of char * being passed. Use a hex editor (or maybe IDE if it allows opening file as raw bytes) to examine bytes in source file. – Peter Stock Jun 02 '16 at 06:34
  • There are 6 or 8 different kinds of unicode your editor could have saved in. utf-7, utf-8, utf-16, ucs-2, and others, with or without a BOM (Byte-Order-Mark). Which one did your editor save? – gman Jun 02 '16 at 15:39
  • I don't know which one my editor saved, how can I check that? – huzzm Jun 03 '16 at 20:32
  • @huzzm debugger and/or hex editor, as previous comment. – Peter Stock Jun 04 '16 at 06:46