2

This question and Jon's answer made me aware this even existed, so I got curious and launched Visual Studio.


I followed along one example of the MSDN page, and then I created my own little example. It's as follows:

public class Person : IEquatable<Person>
{
    public string IdNumber { get; set; }
    public string Name { get; set; }

    public bool Equals(Person otherPerson)
    {
        if (IdNumber == otherPerson.IdNumber)
            return true;
        else
            return false;
    }

    public override bool Equals(object obj)
    {
        if (obj == null) 
            return base.Equals(obj);

        if (!(obj is Person))
            throw new InvalidCastException("The Object isn't of Type Person.");
        else
            return Equals(obj as Person);
    }

    public override int GetHashCode()
    {
        return IdNumber.GetHashCode();
    }

    public static bool operator ==(Person person1, Person person2)
    {
        return person1.Equals(person2);
    }

    public static bool operator !=(Person person1, Person person2)
    {
        return (!person1.Equals(person2));
    }
}

So I have a couple of questions:

  1. If the Equals method does a good job at handling my custom equality, why do I have to override the GetHashCode method as well?

  2. When comparing something like below, which comparer is used, the Equals or the GetHashCode?

.

static void Main(string[] args)
{
    Person sergio = new Person() { IdNumber = "1", Name = "Sergio" };
    Person lucille = new Person() { IdNumber = "2", Name = "Lucille" };

    List<Person> people = new List<Person>(){
        sergio,
        lucille
    };

    Person lucille2 = new Person() { IdNumber = "2", Name = "Lucille" };
    if (people.Contains(lucille2))
    {
        Console.WriteLine("Already exists.");
    }

    Console.ReadKey();
}
  1. What exactly do the operator method do? It looks like some sort of voodoo black magic going on there.
Community
  • 1
  • 1
Only Bolivian Here
  • 35,719
  • 63
  • 161
  • 257

4 Answers4

7

If the Equals method does a good job at handling my custom equality, why do I have to override the GetHashCode method as well?

This allows your type to be used in collections that work via hashing, such as being the key in a Dictionary<T, U>, or storing in a HashSet<T>.

When comparing something like below, which comparer is used, the Equals or the GetHashCode?

GetHashCode is not used for comparisons - only for hashing operations. Equals is always used.

What exactly do the operator method do? It looks like some sort of voodoo black magic going on there.

This allows you to directly use == on two instances of your type. Without this, you'll be comparing by reference if your type is a class, not by the values within your type.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
3

The purpose of GetHashCode is to balance a hash table, not to determine equality. When looking up a member of a hash table the hash bucket checked is determined by the hash code, and then whether the object is in the bucket or not is determined by equality. That's why GetHashCode has to agree with equality.

For more details see my article on the subject:

http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • So basically, first it checks via HashCode if the object exists in the collection, only after that does it actually compare it using the overriden Equals method. Is this correct? – Only Bolivian Here Apr 09 '11 at 22:17
  • You don't "balance" a hash table using the hash key per se, its more of the hasing algorithm that does this. The hashing function needs to be fast and to produce as little collisions (i.e. same key) as possible. So to answer the question, balancing = making sure that the elements are as uniformly as possible distributed in the hash table. – Bogdan Gavril MSFT Apr 09 '11 at 22:19
  • That article contains a list of rules. Classes such as `Dictionary` assume these rules are true. However, there are also other places you might not be thinking of which use `GetHashCode`, such as various pieces of LINQ. These rules end up being broken when you modify `Equals` without modifying `GetHashCode`, causing code which relies on those rules to break. So, you must either avoid any code which relies on these rules or must not modify `Equals` without simultaneously modifying `GetHashCode`. General, the latter option is safer, since `GetHashCode` is a member of `object`. – Brian Apr 11 '11 at 21:18
1

GetHashCode and Equals are two very different things. Equals determines equality. GetHashCode returns a hashcode suitable for a hash map, but does not guarantee equality. Therefore, in equality matters, Equals will be the method that determines equality.

GetHashCodeis intended for hash sets, such as a Dictionary. When looking up an item in a dictionary, you will match the entry on the hashcode, then on Equals.

driis
  • 161,458
  • 45
  • 265
  • 341
0

GetHashCode is used by MSDN only when you use a hash table.

If you need equality you only care about Equals. MSDN suggests to also implement GetHashCode because sooner or later you might use your objects in a hash like object (hash table, hash map etc).

Imagine the objects have 1000 bytes and you need a fast way to determine equality between 2 objects - you calculate the hash key (via GetHashCode). If keys do not match, the objects are different. If they do match, you cannot say for sure if they are indeed equal, you need to verify with Equal() - which is more expensive.

Hash table collections use this ideea.

Bogdan Gavril MSFT
  • 20,615
  • 10
  • 53
  • 74
  • 1
    In some cases `GetHashCode` may be useful for equality testing even in non-hashed collections. For example, if an immutable tree type supports equality comparisons (leaf-nodes are equal if they hold identical data, and other nodes are equal if they have equal children), equality comparisons may be greatly expedited if nodes cache their `GetHashCode` values, and compare hash codes before comparing children. If one will be comparing many nearly-identical trees, using even a crummy `GetHashCode` which had 5% false matches could still speed things up by a factor of almost twenty. – supercat Sep 17 '12 at 16:41