2

How does Except determine if 2 values are the same

I have the following code

var removes = collection.Except(values, comparer).ToList();
var adds = values.Except( collection, comparer).ToList();
foreach (var item in removes)
{
    collection.Remove(item);
}
foreach (var item in adds)
{
    collection.Add(item);
}

however items that the comparer say are equal are included in the except lists, so to see what is happening I put a break point in the Equals function and its not being called, only the GetHashCode() function

So what criteria is being used to compare the items, is it only if the hashes are different that it calls the equality function?

Edit: the comparer Class and compared class are

public class Lookup
{
    public static readonly IEqualityComparer<Lookup> DefaultComparer = new EqualityComparer();
    private class EqualityComparer : IEqualityComparer<Lookup>
    {
        public bool Equals(Lookup x, Lookup y)
        {
            if (x == null)
                return y == null;
            else if (y == null)
                return false;
            else
                return x.ID == y.ID
                    && x.Category == y.Category
                    && x.DisplayText == y.DisplayText
                    && MetaData.CollectionComparer.Equals(x.MetaData, y.MetaData);
        }

        public int GetHashCode(Lookup obj)
        {
            var rtn = new { obj.ID, obj.Category, obj.DisplayText, obj.MetaData }.GetHashCode();

            return rtn;
        }
    }
    [DataMember]
    public int ID { get; set; }
    [DataMember]
    public LookupType Category { get; set; }
    [DataMember]
    public string DisplayText { get; set; }
    [DataMember]
    public MetaData[] MetaData { get; set; }
}
MikeT
  • 5,398
  • 3
  • 27
  • 43
  • If two object are equal, they should return the same value in GetHashCode. – Styxxy Mar 16 '16 at 16:15
  • 2
    Take a look: http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,771 – Jeroen Vannevel Mar 16 '16 at 16:15
  • Or here: https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Except.cs – Alberto Chiesa Mar 16 '16 at 16:16
  • 2
    Post the implementaiton of `comparer`. It shouldn't matter _how_ it works if you implement `Equals` and `GetHashCode` properly. – D Stanley Mar 16 '16 at 16:17
  • @Styxxy yes they should if the hashing algorithm is correct however, hashing algorithm are very complex to do correctly so most of the time a hack is used, in this case its a call to anonymous class builders hashing algorithm – MikeT Mar 16 '16 at 16:20
  • @MikeT "is it only if the hashes are different that it calls the equality function" - that's typically how linq is implemented, but you shouldn't rely on undocumented implementation details. – D Stanley Mar 16 '16 at 16:21
  • I think you may be making the hashing more complex than it needs to be. Hashing based on field values is usually pretty straightforward. – D Stanley Mar 16 '16 at 16:22
  • 1
    And two "equal" objects _have_ to return the same hash code. Not just "should". – D Stanley Mar 16 '16 at 16:23
  • @DStanley that implies its impossible to create a hashing algorithm that's broken – MikeT Mar 16 '16 at 16:27
  • @MikeT In what way does it imply that? It's actually _very easy_ to create a broken one. The requirement is that your hash code algorithm returns equal hashes for equal objects. – D Stanley Mar 16 '16 at 16:30
  • @DStanley if something must be then it implies that its a fundamental and inescapable fact, if something Should be then then is a correct way and a wrong way – MikeT Mar 16 '16 at 16:35
  • @MikeT No, when talking about if something "must" when talking about specifications that means in implementations of the spec there is correct ways and incorrect ways to do it, and incorrect ways are in violation of the specification and should not be expected to work. "Should" mean a implementation is **recommended** to follow the instructions but if it does not follow the instructions the thing you are doing is still expected to work correctly. See [RFC2119](https://www.ietf.org/rfc/rfc2119.txt) for formal explanations of "Must", "Should" , and "May" mean when talking about specifications – Scott Chamberlain Mar 16 '16 at 16:39

2 Answers2

4

is it only if the hashes are different that it calls the equality function?

Yes, it's just like that. And that is for performance reasons (assuming that a GetHashCode implementation should always be much faster than an Equals implementation).

If two objects have a different hash code, they surely won't be the same or equal objects, so there is no need to call Equals. Only if the hash codes are the same, Equals gets called to see if they are really equal or just have the same hash code by accident.

So your GetHashCode implementation should always ensure that equal objects have the same hash code.

Since your implementation of GetHashCode creates an instance of an anonymous type and call GetHashCode on that instance, the hash codes will always be different and so all your objects are different from each other.

René Vogt
  • 43,056
  • 14
  • 77
  • 99
2

The problem is the implementation of your comparer. In equals you have

...
&& MetaData.CollectionComparer.Equals(x.MetaData, y.MetaData);

however in GetHashCode you have

new { ..., obj.MetaData }.GetHashCode();

That means it is using the default comparer for the hash code but MetaData.CollectionComparer for the equals. This is causing things that have MetaData.CollectionComparer.Equals(x.MetaData, y.MetaData) == true to return diffrent hash codes which breaks anything based on a Set logic (Which .Exclude( uses).

Replace the hash code hack of using a anonymous class with a actual hash code implementation using MetaData.CollectionComparer.GetHashCode(obj.MetaData) and it should work.

public int GetHashCode(Lookup obj)
{
    unchecked
    {
        var hashCode = obj.ID;
        hashCode = (hashCode*397) ^ (obj.Category != null ? obj.Category.GetHashCode() : 0);
        hashCode = (hashCode*397) ^ (obj.DisplayText != null ? obj.DisplayText.GetHashCode() : 0);
        hashCode = (hashCode*397) ^ MetaData.CollectionComparer.GetHashCode(obj.MetaData);
        return hashCode;
    }
}

(This implementation is the one ReSharper comes up with, tweaked to use MetaData.CollectionComparer)

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • for the records you're probably correct about how to fix the hashing algorithm, but that wasn't the question that was asked, i needed to understand how linq performs compares to be able avoid making these mistakes in future – MikeT Mar 16 '16 at 16:51
  • I said that, "*This is causing things that have `MetaData.CollectionComparer.Equals(x.MetaData, y.MetaData) == true` to return diffrent hash codes which breaks anything based on a Set logic (Which `.Exclude(` uses).*" – Scott Chamberlain Mar 16 '16 at 16:52
  • Also, you could have always checked the msdn page for it you will see "*Produces the set difference of two sequences by using the specified IEqualityComparer to compare values.*", if you check the interface you will see you must implment `Equals(` and `GetHashCode(`. if you check the documentation of `GetHashCode(` [it states](https://msdn.microsoft.com/en-us/library/ms132155(v=vs.100).aspx) "*Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.*"... – Scott Chamberlain Mar 16 '16 at 16:56
  • You violated that contract with your implementation so you can not rely on anything that uses `IEqualityComparer` to work correctly. – Scott Chamberlain Mar 16 '16 at 16:58