0

In order to know how it works Distinct in linq form exemple compare two instances, I don't understand why Distinct needs GetHashCode, since all it has to do is call Equals method to compare.

public class book {
        public int Id { get; set; }
        public string name { get; set; }
        public override bool Equals(object obj) {
            var x = this;
            var item = obj as book;
            return this.Id == item.Id;
        }
        public override int GetHashCode() {
            var x = this;
            return this.Id.GetHashCode();
        }
    }
    class Program {
        static void Main(string[] args) {
            book a = new book() { Id = 1 };
            book b = new book() { Id = 2 };
            book c = new book() { Id = 2 };
            List<book> listOfBooks = new List<book>(){ new book(){Id=1},new book(){Id=1},c};
            var asdsd = listOfBooks.Distinct().ToList();
            bool x = b==c;
        }
    }
gvivetapl
  • 445
  • 1
  • 5
  • 15
  • It reduces the number of comparisons by a factor of 4 billion, assuming a well-distributed hash. – Raymond Chen Jun 08 '17 at 15:55
  • Because `Distinct` uses hash set internally. If it did not do that, it would have to compare all N elements to all other N-1 elements, resulting in a lot of unnecessary comparisions. – Evk Jun 08 '17 at 16:01
  • @RaymondChen: it reduces the number of comparisons considerably, but in most cases, not by the magic number 2^32. Selecting distinct elements in a collection of N elements is O(N^2) if we can't hash (or sort). – Jeroen Mostert Jun 08 '17 at 16:05
  • Actually, I don't think it reduces the number of comparisons at all. It just makes them more efficient -- comparing two ints as opposed to two objects. – James Curran Jun 08 '17 at 16:08
  • @JamesCurran but based on hash code (int) it chooses a bucket in hash table. Then it compares target item with all items in that bucket only and not with all items in the list. So number of comparisions is reduced. – Evk Jun 08 '17 at 16:10
  • @Evk Then the efficiency comes from the implementation using a hash table -- and, more importantly, caching the hash code rather than call GetHashCode() every time. The hash code itself is just a minor implementation detail. – James Curran Jun 08 '17 at 16:18
  • @JamesCurran: no. If you can't use a hash, then you must compare each element to all the distinct elements already established. Even with auxiliary storage, this is more comparisons than hashing. It has *nothing at all* to do with caching the hash code, in fact, this is unnecessary for the speedup. The hash gets its speed from the ability to index directly into an array to test for membership, something which is simply not possible with arbitrary objects. If you don't believe this, please give me an O(N) implementation of `.Distinct` for `IEquatable`s without hashing. – Jeroen Mostert Jun 08 '17 at 16:23
  • @JeroenMostert Clarify what you mean by "If you can't use a hash". I'm acknowledging the use of a hash table. However, the concept of it is "choose a bucket, based on the object". The actual hash code is merely an implementation detail. – James Curran Jun 08 '17 at 16:31
  • @JamesCurran: your original claim was "I don't think it reduces the number of comparisons at all". The availability of `GetHashCode` is what enables the use of a hash table, which directly reduces the number of comparisons using `Equals` as compared with any algorithm that can't. If your claim is still that this isn't true, it's false. If you're claiming anything else, we're probably talking past each other. – Jeroen Mostert Jun 08 '17 at 17:57
  • @JamesCurran: note that, in particular, the only way to "choose a bucket" if you *don't* have `GetHashCode` is to compare the object with every other bucket already established, which is O(N). If you do have `GetHashCode`, it's O(1). This is not an implementation detail; it's an order of magnitude. The *exact nature* of the hash indeed does not matter, but we must be able to determine the bucket in O(1), which is not possible if we only have `Equals`. – Jeroen Mostert Jun 08 '17 at 18:03

0 Answers0