Assuming you mean LINQ to Objects, it basically keeps a set of all the results it's returned so far, and only yields the "current" item if it hasn't been yielded before. So the results are in the original order, with duplicates removed. Something like this (except with error checking etc):
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source)
{
HashSet<T> set = new HashSet<T>();
foreach (T item in source)
{
if (set.Add(item))
{
// New item, so yield it
yield return item;
}
}
}
This isn't guaranteed - but I can't imagine any more sensible implementation. This allows Distinct()
to be as lazy as it can be - data is returned as soon as it can be, and only the minimum amount of data is buffered.
Relying on this would be a bad idea, but it can be instructive to know how the current implementation (apparently) works. In particular, you can easily observe that it starts returning data before exhausting the original sequence, simply by creating a source which logs when it produces data to be consumed by Distinct
, and also logging when you receive data from Distinct
.