2

Is there a way in c++ to convert from ö to o, or ß to s, in general from utf-8 to the corresponding char from ASCII ?

smaryus
  • 349
  • 3
  • 12
  • 5
    Not relevant to coding at all, but people normally write the characters with normal letters like this: would be ß->ss and ö->oe – chigley Oct 14 '10 at 16:27
  • 1
    @chigley: The problem is, that this depends on the language. – Chris Lercher Oct 14 '10 at 16:29
  • @chris_l - I gathered from the Eszett that the source text is in German, as I can't think of any other languages that use it! (Wiki confirmed this, except for "romanising the Sumerian language") – chigley Oct 14 '10 at 16:37
  • 1
    @chigley: Probably true in that case. But what I want to say is, that you can't simply build a table like `ö->oe`, `ü->ue` etc., and expect it to work correctly on all texts, because other languages may use different replacements (I believe, it's different for `ü` in Turkish for example). – Chris Lercher Oct 14 '10 at 16:42
  • UTF-8 supports 16 million distinct characters. How many of these are you planning to map? The obvious solution of course is to not attempt this. Modern operating systems have no trouble with Unicode. – Hans Passant Oct 14 '10 at 17:42
  • Note that there are no corresponding characters in the ASCII table for many, many characters in many different languages. Besides, language is not only alphabet and grammar: what about top-down reading order, should `cout` be able to print vertically? – alecov Oct 14 '10 at 23:18

1 Answers1

1

Standard C++ does not support UTF-8. I would suggest this library: http://utfcpp.sourceforge.net/

If you want to, maybe it is possible to use in-built POSIX or Windows functionality for this. But then it's not portable.

Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
  • every 8-bit char supports UTF-8. Win32 API sucks at UTF-8, but that's different from "c++ doesn't support UTF-8". C++ has no concept (except and facets) of character encoding, and often doesn't need it either... – rubenvb Oct 14 '10 at 17:48
  • Certainly it is not impossible to handle UTF-8 using C++, but there is no language support for it. As opposed to e.g. C#, Python or Java. – Johan Kotlinski Oct 14 '10 at 17:51
  • I think it's fair to say that a language which has no concept of encoding doesn't support UTF-8. ;) – jalf Oct 14 '10 at 18:33
  • The implementation was needed on linux. iconv() was approx what i need, but it wasn't good enough. So i've create a table like ö->oe, ü->ue. And in time i will extend the table. – smaryus Oct 21 '10 at 12:28