-1

I need to manage multiple European languages and I can't use utf-8 but only ansi.

I ask if there is a way to find the ansi equivalent for special chars of European languages:

Here a example of equivalent tables:

Italian

à = a'
è = e'
ì = i'
ò = o'
ù = u'

Spanish

á = ‘ + a
é = ‘ + e
í = ‘ + i
ó = ‘ + o
ú = ‘ + u
ñ = ~ + n
ü = ” + u
¡ = Alt (hold down) + !
¿ = Alt (hold down) + ?

German

Ä   Uppercase Umlaut (A)
Ö   Uppercase Umlaut (O)
Ü   Uppercase Umlaut (U)
ß   Eszett          (ss)
ä   Lowercase Umlaut (a)
ö   Lowercase Umlaut (o)
ü   Lowercase Umlaut (u)


French

À   Uppercase Accent Grave (A)
   Uppercase Accent Circonflex (A)
Ä   Uppercase Accent Tréma (A)
Æ   Uppercase Ligature(AE)
Ç   Uppercase Cedilla (C)
È   Uppercase Accent Grave (E)
É   Uppercase Accent Aigu (E)
Ê   Uppercase Accent Circonflex (E)
Ë   Uppercase Accent Tréma (E)
Î   Uppercase Accent Circonflex (I)
Ï   Uppercase Accent Tréma (I)
Ô   Uppercase Accent Circonflex (O)
Œ   Uppercase Ligature(OE)
Ù   Uppercase Accent Grave (U)
Û   Uppercase Accent Circonflex (U)
Ü   Uppercase Accent Tréma (U)
à   Lowercase Accent Grave (a)
â   Lowercase Accent Circonflex (a)
ä   Lowercase Accent Tréma (a)
æ   Lowercase Ligature(ae)
ç   Lowercase Cedilla (c)

For example for Italian (my mother tongue) the accent can be easy convert in this way:

  à = a'
  è = e'
  ì = i'
  ò = o'
  ù = u'

How ask if there is an easy way to do the some for the other languages.

Thanks !

  • "I can't use utf-8 but only ansi" **why**? The whole point of Unicode is to prevent people from having to do what you're trying to do, because what you're trying to do is hideously error-prone and cannot work in all scenarios. This question feels like an X-Y problem. – Ian Kemp Feb 08 '21 at 13:05
  • Seems similar to "transliteration" though the latter tends to just suppress the accents. Here is a [C# port](https://github.com/thecoderok/Unidecode.NET) of a Perl library. – Klaus Gütter Feb 08 '21 at 14:11

1 Answers1

1

Only way I can think is to do manula mapping.

I would define some kind of mapper, which, based on the language would return "normalized to ANSI" character (so country specific UTF8 character would become ANSI character).

Here's a draft what I am talking about:

public static class CountrySpecificMapper
{
    private static Dictionary<char, string> _frenchDict = new Dictionary<char, string>()
    {
        {'À',"A"},
        {'Â',"A"},
        {'Ä',"A"},
        {'Æ',"AE"},
        {'Ç',"C"},
        {'È',"E"},
        {'É',"E"},
        {'Ê',"E"},
        {'Ë',"E"},
        {'Î',"I"},
        {'Ï',"I"},
        {'Ô',"O"},
        {'Œ',"OE"},
        {'Ù',"U"},
        {'Û',"U"},
        {'Ü',"U"},
        {'à',"a"},
        {'â',"a"},
        {'ä',"a"},
        {'æ',"ae"},
        {'ç',"c"} ,
    };

    private static Dictionary<char, string> _germanDict = new Dictionary<char, string>()
    {
        {'Ä', "A"},
        {'Ö', "O"},
        {'Ü', "U"},
        {'ß', "ss"},
        {'ä', "a"},
        {'ö', "o"},
        {'ü', "u"},
    };

    private static Dictionary<CultureInfo, Dictionary<char, string>> _langToDict = new Dictionary<CultureInfo, Dictionary<char, string>>()
    {
        {new CultureInfo("fr"), _frenchDict },
        {new CultureInfo("de"), _germanDict },
    };

    public static string MapCharacter(char @char, CultureInfo cultureInfo)
    {
        if (cultureInfo is null) throw new ArgumentNullException(nameof(cultureInfo));

        var dict = _langToDict[cultureInfo];
        if (!dict.ContainsKey(@char))
        {
            // error or other validation
        }
        return dict[@char];
    }
}

Obviously, CultureInfo is rference type, so it is not good candidate for dictionary key used in internal implementation of a API (outer code will create own object of that class for the same culture, but reference address will differ..).

IT IS ONLY FOR PRESENTATION PURPOSE.

You can maybe rely on LCID property of CultureInfo, or define own keys, more suitable for your solution.

After all that work is done, converting diacritic characters are as simple as

var convertedChar = CountrySpecificMapper.MapCharacter(charToConvert, languageKey);
Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69