1

What is the recommended way of reading some user input that can have special characters with e.g., accents, if it is not known in which locale it is input.

How to safely compare a character of this user input if it is a special one, that I need to handle some way?

This is a sample code to illustrate the intent:

#include <iostream>
using namespace std;

int main() {
    char txt[10];
    cin.getline(txt, sizeof(txt));
    if(txt[0] == 'á')
        cout << "Special character found\n";
}

The problem is:

warning: multi-character character constant [-Wmultichar]
     if(txt[0] == 'á')
                  ^

If I use L'á' as wide character literal, then it will not match, since the input is not wide.

If I use wchar_t and wcin.getline too to get user input in wide character, then it may work on some systems and may not on others, depending on the environment and the locale settings.

How to safely and portably get over this problem? Thanks!

and roid
  • 11
  • 1
  • 2
    You might want to consider using a unicode library like [ICU](http://site.icu-project.org/) – NathanOliver Jun 27 '17 at 11:34
  • The best approach, or our favorite? I've gone along well using UTF8 internally, in Windows, I have to convert back/forth to UTF-16 for UI display and input, but it's worth it. I've used exclusively `boost::locale::conv::utf_to_utf` to do the conversions. – Michaël Roy Jun 27 '17 at 13:11

1 Answers1

1

If you both don't know your locale and have to make your solution portable, then I'm afraid there's no standard C++ solution for that. And I'm not sure it will ever be there, taking into account Windows using UTF-16. So if you need "out-of-box solution", it would probably make sense to check the library mentioned in NathanOliver's comment.

Having said that, although Unicode support still remains a pain point of C++ (and it's really sad that I'm writing these words in the year of 2017), there are certain improvements that came with C++11.

So in case manual conversion is an option for you, you may profit from some of its goodies.

For instance, here is a valid C++11 code.

unsigned char euroUTF8[] = { 0xE2, 0x82, 0xAC, 0x00 }; // Euro sign UTF8

wstring_convert<codecvt_utf8<wchar_t>> converter_UTF8_wchar;
wstring euroWideStr = converter_UTF8_wchar.from_bytes((char*)euroUTF8);
wcout << euroWideStr << endl;

string euroNarrowStr = converter_UTF8_wchar.to_bytes(euroWideStr);
cout << euroNarrowStr << endl;

For more context check this article

Vasiliy Galkin
  • 1,894
  • 1
  • 14
  • 25