7

Can I write a hash code function for the following comparer logic?

Two instances of My are equal if at least two properites from (A, B, C) match.

The Equals part is simple, but I'm stumped on the hash code part, and part of me is thinking it might not be possible.

class MyOtherComparer : IEqualityComparer<My>
{
    public bool Equals(My x, My y)
    {
        if (Object.ReferenceEquals(x, y)) 
            return true;      

        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null)) 
            return false;

        int matches = 0;

        if(x.A == y.A) matches++;

        if(x.B == y.B) matches++;

        if(x.C == y.C) matches++;

        // match on two out of three
        return  (matches > 1)
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.
    public int GetHashCode(My x)
    {
       // ???
    }
}

UPDATE: In addition to the correct answer by Reed Copsey, a very important point about the general usefulness of a fuzzy comparer is clearly stated by Ethan Brown - please see his answer as well for a full understanding of what underlies this Question/Answer.

Aaron Anodide
  • 16,906
  • 15
  • 62
  • 121
  • I would think that your intuition is right and that you can't reliably generate a hash code that stands with fuzzy comparison. – zneak Jun 04 '12 at 17:49
  • I would suggest not touching GetHashCode and Equals at all in your case, and make a method FuzzyEquals. – SimpleVar Jun 04 '12 at 17:54
  • What are the types of `A`, `B`, and `C`? Might we be able to modify their `GetHashCodes` formulas? – Servy Jun 04 '12 at 17:56
  • @YoryeNathan, I want to use it with framework methods that take `IEqualityComparer` such as `IEnumerable.Except(...)` so I have to stick to the interface – Aaron Anodide Jun 04 '12 at 18:00
  • 2
    You have a bigger problem, in that you can have three objects, where "one==two" and "one==three" but "two!=three" - which will mess up most routines that use `IEqualityComparer` anyways. – Reed Copsey Jun 04 '12 at 18:01
  • @AaronAnodide Then make IFuzzyEqualityComparer extend from IEqualityComparer, and so on. – SimpleVar Jun 04 '12 at 18:05
  • I'm starting to see this now - wow just when you think you understand something a whole new level of detail emerges... – Aaron Anodide Jun 04 '12 at 18:06
  • Reed is correct; equality is supposed to be an *equivalence relation*. That is, it should be **reflexive**: x == x is always true. It should be **symmetric**: if x == y is the same as y == x. And it should be **transitive**: x == y and y == z implies that x == z. If you violate these properties then equality stops working as users expect. – Eric Lippert Nov 19 '13 at 17:25

6 Answers6

5

Yes, it's possible. The simplest implementation would be to always return a constant.

public int GetHashCode(My x) 
{ 
   return 0;
}

The GetHashCode documentation states:

Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.

However, you are perfectly free to return the same hash code for two objects which are not equal, as well.

That being said, this will potentially cause some algorithms to perform very poorly, as you'll get lots of hash collisions. However, given the nature of your odd/unique equality check, this may be required.


Note that this is going to be problematic in any case. Given your logic, it's possible to have three objects, where comparer.Equals(foo, bar)==true and comparer.Equals(foo, baz)==true but comparer.Equals(baz, bar)==false. This is likely to be problematic in many cases where IEqualityComparer<T> is used.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • oh that last bit is very interesting to me - your hinting that a comparer that works the way I coded it has issues other than possibly slow perforance due to constant hash - it sounds like you saying all assumptions are going out the window here... I originally was trying to use this with IEnumerable.Except(T, IEqualityComparer) - luckly it was in the spirit of learning/practicing and nothing important hinges on it working :) – Aaron Anodide Jun 04 '12 at 18:04
  • @AaronAnodide Yeah - I'd expect many of the Enumerable extension methods to do strange things with this comparer... YMMV – Reed Copsey Jun 04 '12 at 18:10
  • I re-read what you wrote I my brain flashed back to 8th grade math class and I think you are saying that `Equals` is required to return truth values such that the transitive property of equality is satisfied... – Aaron Anodide Jun 04 '12 at 18:20
  • @AaronAnodide Effectively, yes - I don't know how many algorithms require this, but the expectation of `Equals` is typically that it's transitive. – Reed Copsey Jun 04 '12 at 19:03
1

A hash code has to be the same for two equal objects, but it does not have to be different for two different objects. You could return the same value for all objects to satisfy the IEqualityComparer consumers, but there's no way I know of to get any speed benefits from the hash in your situation.

C.Evenhuis
  • 25,996
  • 2
  • 58
  • 72
1

Can I write a hash code function for the following comparer logic?

Yes. You can always write a hash code for anything. The question is how efficient it will be. No matter what, you can always have:

public int GetHashCode()
{
  return 0;
}

It will always work, but it's horribly * inefficient*.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • I'm not sure there is a way to generate a reasonable hash without more information about the objects themselves. If there was some form of pattern in allowable values, then there may be a valid way to generate a hash that would reduce collisions - but with 3 arbitrary values, and "fuzzy" matching like this, I'm not sure there is anything guaranteed better than a constant. – Reed Copsey Jun 04 '12 at 17:56
1

Suppose that we have 2 objects A,B. Each of them have properties p1,p2 and p3. Suppose A.p1 == B.p1 and A.p3 == B.p3 , if the hash function depends on p2 , it'll be different for A and B so they are not equal. If you want to calculate the hash function based on the p1 and p3, there are many examples which hash function wouldn't return the right hash value and many equal objects will be not equal. I think we can't have a variable function. You can use a constant one but if you want to use it as a hash key in a dictionary or hash table you will not gain near O(1) complexity.

Kamran Amini
  • 1,062
  • 8
  • 14
1

The core problem with arriving at a non-constant hash function is that you can't ensure transitivity across equality. Normally, equality is considered transitive. That is, A=B and B=C implies that A=C (which further implies that A, B, and C would all have the same hash code). However, with your definition of equality, you could have A=B, B=C and A!=C. Ideally, unequal elements would have different hash codes, so A and C would have different hash codes; but they can't because they both equal B, so they have to all have the same hash code.

The only way you could arrive at a non-constant hash function is if you knew something about your total collection. You would have to partition the collection into "equality bins" where each element in a bin is equal to some other element in the bin (including the possibility of a bin of one). Once you have done this partitioning, then you can use that to generate a non-constant algorithm (assuming you get more than one bin) for generating a hash code.

The thing about the idea of equality bins is that there could be many such configurations of bins. As a selection criteria, you may want to maximize the number of bins (to improve hashtable lookup performance). The degenerative case (as indicated in Reed Copsey's correct answer) is that you put everything in the same bin (though, as supercat points out in the comments below, the name "equality bins" then becomes misleading). That doesn't violate any of the constraints of hash values, but it will lead to poor performance in algorithms that expect has values to produce a non-degenerative partitioning.

As supercat points out below, to satisfy the constraints of hash values, the following must be true: if two elements are in two different bins, they must not be equal (however, two elements in the same bin do not have to be equal).

Ethan Brown
  • 26,892
  • 4
  • 80
  • 92
  • I *just* figured that out for myself as evidenced in my last comment on the thread I marked answer before I totally got it - thank you though - this is indeed the core problem here – Aaron Anodide Jun 04 '12 at 18:21
  • I think your description of "equality bins" isn't quite right. What would be required is that every element of an equality bin be *unequal* to every object anywhere in the universe which isn't in that same bin. – supercat Nov 16 '13 at 00:51
  • You're essentially correct, @supercat: I didn't word that second paragraph precisely enough. I'll update accordingly. – Ethan Brown Nov 18 '13 at 19:57
  • @EthanBrown: The term "equality bins" is fine; the important thing is objects that *might* be equal to each other must be in the same bin. Only if two objects are *known* to be unequal may they occupy different bins. – supercat Nov 18 '13 at 20:12
  • You're right, supercat, I keep missing the important bit of info! I just updated the answer...I think I got it this time. – Ethan Brown Nov 19 '13 at 17:14
0

Seeing that your real problem was dealing with the Except extension method, I decided to suggest something for you, although not really an answer.

public class EqualityComparer<T> : IEqualityComparer<T>
{
    private readonly Func<T, T, bool> _comparer;
    private readonly Func<T, int> _hashCoder;

    public EqualityComparer(Func<T, T, bool> comparer, Func<T, int> hashCoder = null)
    {
        if (comparer == null)
        {
            throw new ArgumentNullException("comparer");
        }

        this._comparer = comparer;
        this._hashCoder = hashCoder ?? (x => 0);
    }

    public bool Equals(T x, T y)
    {
        return this._comparer(x, y);
    }

    public int GetHashCode(T obj)
    {
        return this._hashCoder(obj);
    }
}

And then you can use it like so:

arr1.Except(arr2, new EqualityComparer<dynamic>((x, y) =>
     {
         if (ReferenceEquals(x, y))
             return true;

         if (ReferenceEquals(x, null) ||
             ReferenceEquals(y, null))
             return false;

         var matches = 0;

         if (x.A == y.A) matches++;
         if (x.B == y.B) matches++;
         if (x.C == y.C) matches++;

         return (matches > 1);
     }));
SimpleVar
  • 14,044
  • 4
  • 38
  • 60