14

Let's say I'm using the LINQ array .Distinct() method. The result is unordered.

Well, everything is "ordered" if you know the logic used to produce the result.

My question is about the result set. Will the resulting array be in the "first distinct" order or perhaps the "last distinct" order?

Can I never count on any order?

This is the old "remove duplicate strings" problem but I'm looking into the LINQ solution.

jball
  • 24,791
  • 9
  • 70
  • 92
Matthew
  • 10,244
  • 5
  • 49
  • 104

5 Answers5

21

Assuming you mean LINQ to Objects, it basically keeps a set of all the results it's returned so far, and only yields the "current" item if it hasn't been yielded before. So the results are in the original order, with duplicates removed. Something like this (except with error checking etc):

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source)
{
    HashSet<T> set = new HashSet<T>();

    foreach (T item in source)
    {
        if (set.Add(item))
        {
            // New item, so yield it
            yield return item;
        }
    }
}

This isn't guaranteed - but I can't imagine any more sensible implementation. This allows Distinct() to be as lazy as it can be - data is returned as soon as it can be, and only the minimum amount of data is buffered.

Relying on this would be a bad idea, but it can be instructive to know how the current implementation (apparently) works. In particular, you can easily observe that it starts returning data before exhausting the original sequence, simply by creating a source which logs when it produces data to be consumed by Distinct, and also logging when you receive data from Distinct.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 3
    You could also just add your own extension method (e.g. DistinctOrdered) with the implementation Jon provided. That way you would always have an implementation with a defined order regardless of the .NET Framework version. – Karsten Nov 04 '15 at 19:37
  • adding to the [Jon Skeet Facts](http://meta.stackexchange.com/questions/9134/jon-skeet-facts) - The [.NET Reference Source](https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,4ab583c7d8e84d6d) is based on Jon Skeet's answers – Slai Jan 17 '17 at 17:19
8

The docs say:

"The result sequence is unordered."

Gabriel Magana
  • 4,338
  • 24
  • 23
  • I know this. My point is that the notion that the order is "random" really doesn't hold water... unless the method is something completely foreign to me. – Matthew Nov 05 '10 at 20:52
  • 4
    @matthew: Ok, but you ask "Can I never count on any order?" Since the docs clearly state that the result is unordered, then you cannot count on any order. If it is in a certain order today, with the next .NET bugfix that may change, since there is no order guarantee. – Gabriel Magana Nov 05 '10 at 20:54
  • 1
    @matthew: Check Jon's answer. At best, the order is the same order the data came in, but as everyone has been saying, and as per the docs, there is no guarantee of any particular order. If you need order, add a OrderBy to the Linq, eg. `var result = sourceItems.Distinct().OrderBy(item => item.ValueToOrderOn)` – Will Nov 05 '10 at 21:04
3

You can never count on any order. It would be entirely permissible for LINQ to implement this using hash tables (and indeed, I believe it IS implemented that way in .NET 4).

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
1

The Distinct method doesn't officially guarantee an order as far as I know, although in practice the LINQ to Objects implementation returns the groups in the order they first appear in the source enumerable.

If you use LINQ to SQL for example then it is up to the database to decide what order it wishes to return the results in and then you should not rely on this order even being consistent from one call to the next.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
1

At a guess it's using a hash table to produce the set of distinct keys, and producing the output in order by the hashes.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111