16

I have List like this

    List<string> items = new List<string>();
    items.Add("-");
    items.Add(".");
    items.Add("a-");
    items.Add("a.");
    items.Add("a-a");
    items.Add("a.a");

    items.Sort();

    string output = string.Empty;
    foreach (string s in items)
    {
        output += s + Environment.NewLine;
    }

MessageBox.Show(output);

The output is coming back as

-
.
a-
a.
a.a
a-a

where as I am expecting the results as

-
.
a-
a.
a-a
a.a

Any idea why "a-a" is not coming before "a.a" where as "a-" comes before "a."

Massimiliano
  • 16,770
  • 10
  • 69
  • 112
Satya
  • 163
  • 1
  • 5

3 Answers3

18

I suspect that in the last case "-" is treated in a different way due to culture-specific settings (perhaps as a "dash" as opposed to "minus" in the first strings). MSDN warns about this:

The comparison uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters. For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.

Also see in this MSDN page:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them; for example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases; therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

So, hyphen gets a special treatment in the default sort mode in order to make the word sort more "natural".

You can get "normal" ordinal sort if you specifically turn it on:

     Console.WriteLine(string.Compare("a.", "a-"));                  //1
     Console.WriteLine(string.Compare("a.a", "a-a"));                //-1

     Console.WriteLine(string.Compare("a.", "a-", StringComparison.Ordinal));    //1
     Console.WriteLine(string.Compare("a.a", "a-a", StringComparison.Ordinal));  //1

To sort the original collection using ordinal comparison use:

     items.Sort(StringComparer.Ordinal);
Massimiliano
  • 16,770
  • 10
  • 69
  • 112
6

If you want your string sort to be based on the actual byte value as opposed to the rules defined by the current culture you can sort by Ordinal:

items.Sort(StringComparer.Ordinal);

This will make the results consistent across all cultures (but it will produce unintuitive sortings of "14" coming before "9" which may or may not be what you're looking for).

Jared Shaver
  • 1,339
  • 8
  • 12
  • Thanks Jared, Could you tell me how I can sort if the data is in a column of DataTable `DataTable dataTable = new DataTable(); dataTable.Columns.Add("Item", typeof (string)); dataRow = dataTable.NewRow(); dataRow["Item"] = "a-a"; dataTable.Rows.Add(dataRow); dataRow = dataTable.NewRow(); dataRow["Item"] = "a.a"; dataTable.Rows.Add(dataRow); DataRow[] rows = dataTable.Select("", "Item ASC");` – Satya Feb 20 '12 at 01:42
4

The Sort method of the List<> class relies on the default string comparer of the .NET Framework, which is actually an instance of the current CultureInfo of the Thread.

The CultureInfo specifies the alphabetical order of characters and it seems that the default one is using an order different order to what you would expect.

When sorting you can specify a specific CultureInfo, one that you know will match your sorting requirements, sample (german culture):

var sortCulture = new CultureInfo("de-DE");
items.Sort(sortCulture);

More info can be found here:
http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx
http://msdn.microsoft.com/de-de/library/system.stringcomparer.aspx

ntziolis
  • 10,091
  • 1
  • 34
  • 50
  • what not clear is "-" (hyphen) is coming before "."(dot) and "a-" before "a."; why not 'a-a" before "a.a"? – Satya Feb 20 '12 at 01:15
  • Theoretically, the current culture might consider `.` and `-` to be the same order. The `.Sort` method is "unstable", which means that the order of equal items is not guaranteed. – Scott Rippey Feb 20 '12 at 01:20
  • 1
    I tested on US English and got the same results as the OP. Even when testing using String.Compare, I never got 0 (equal). I either got -1 or 1, depending on which was first. So it is probably not a problem with the .Sort method. – John Koerner Feb 20 '12 at 01:22
  • I trie `System.Threading.Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); items.Sort();` but the results haven't changed – Satya Feb 20 '12 at 01:27
  • I think Yacoder has cracked the case in his answer, it's the word sort thing that introduces this special handling – ntziolis Feb 20 '12 at 01:33