4

Why "Ū" comes first instead "U"?

    CultureInfo ci = CultureInfo.GetCultureInfo("lt-LT");
    bool ignoreCase = true; //whether comparison should be case-sensitive
    StringComparer comp = StringComparer.Create(ci, ignoreCase);
    string[] unordered = { "Za", "Žb", "Ūa", "Ub" };
    var ordered = unordered.OrderBy(s => s, comp);

Results of ordered : Ūa Ub Za Žb

Should be: Ub Ūa Za Žb

Here is lithuanian letters in order. https://www.assorti.lt/userfiles/uploader/no/norvegiska-lietuviska-delione-abecele-maxi-3-7-m-vaikams-larsen.jpg

TylerH
  • 20,799
  • 66
  • 75
  • 101
  • 2
    http://stackoverflow.com/questions/1371813/why-does-string-compare-seem-to-handle-accented-characters-inconsistently – Matteo Umili Sep 23 '16 at 08:31

1 Answers1

1

I just made what could be a (limited) solution to your problem. This is not optimized, but it can give an idea of how to solve it. I create a LithuanianString class which is only used to encapsulate your string. This class implement IComparable in order to be able to sort a list of LithuanianString.

Here is what could be the class:

public class LithuanianString : IComparable<LithuanianString>
{

    const string UpperAlphabet = "AĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽ";
    const string LowerAlphabet = "aąbcčdeęėfghiįyjklmnoprsštuųūvzž";
    public string String;

    public LithuanianString(string inputString)
    {
        this.String = inputString;
    }

    public int CompareTo(LithuanianString other)
    {
        var maxIndex = this.String.Length <= other.String.Length ? this.String.Length : other.String.Length;
        for (var i = 0; i < maxIndex; i++)
        {
            //We make the method non case sensitive
            var indexOfThis = LowerAlphabet.Contains(this.String[i])
                ? LowerAlphabet.IndexOf(this.String[i])
                : UpperAlphabet.IndexOf(this.String[i]);

            var indexOfOther = LowerAlphabet.Contains(other.String[i])
                ? LowerAlphabet.IndexOf(other.String[i])
                : UpperAlphabet.IndexOf(other.String[i]);

            if (indexOfOther != indexOfThis)
                return indexOfThis - indexOfOther;
        }
        return this.String.Length - other.String.Length;
    }
}

And here is a sample I made to try it :

static void Main(string[] args)
    {
        string[] unordered = { "Za", "Žb", "Ūa", "Ub" };

        //Create a list of lithuanian string from your array
        var lithuanianStringList = (from unorderedString in unordered
            select new LithuanianString(unorderedString)).ToList();
        //Sort it 
        lithuanianStringList.Sort();

        //Display it
        Console.WriteLine(Environment.NewLine + "My Comparison");
        lithuanianStringList.ForEach(c => Console.WriteLine(c.String));
    }

The output is the expected one :

Ub Ūa Za Žb

This class allows to create custom alphabets just by replacing characters in the two constants defined at the beginning.

Vasilievski
  • 773
  • 6
  • 13