-1

I'm trying to create and use only immutable classes where all fields are readonly immutable types, though there may be additional fields which are mutable and not considered to be part of the object's state (mainly a cached hashcode).

When implementing IEquatable I do the same as I would for non immutable objects

Ie,

public bool Equals(MyImmutableType o) => 
  object.Equals(this.x, o.x) && object.Equals(this.y, o.y);

Now being immutable this seems inefficient, the object will never change, if I could calculate and store some unique fingerprint of it I could simply compare fingerprints instead of whole fields (which may call their own Equals etc).

I am wondering what can be a good solution for this ? will BinaryFormatter + MD5 be worth exploring ?

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
kofifus
  • 17,260
  • 17
  • 99
  • 173
  • 1
    What you have is not inefficient, it's the correct way to write an `Equals` method. – Rufus L Feb 08 '19 at 01:31
  • Is this "inefficiency" actually a measurable problem for you? You cannot compare (only) hash-codes in `Equals`, that won't guarantee equality. – Blorgbeard Feb 08 '19 at 01:31
  • Rufus, depends on the complexity of the object it can be incredibly less efficient than ie comparing two md5 values – kofifus Feb 08 '19 at 01:32
  • Are you planning to compare each and every created object very often? – Andrew Feb 08 '19 at 01:33
  • possibly they are used as keys in maps etc .. I was thinking if there could be a general pattern of computing/caching a fingerprint on creation for immutable objects – kofifus Feb 08 '19 at 01:34
  • If your code doesn't have to be correct, it can be infinitely efficient. If you want your Equals method to actually work, then you are already doing the minimum amount of work required, disregarding micro-micro-optimizations that you will likely not be able to measure anyway. – Blorgbeard Feb 08 '19 at 01:35
  • Blorgbeard - but I do it _every_ time I compare vs a one-time fingerprint calculation – kofifus Feb 08 '19 at 01:38
  • 3
    Any "fingerprint" that is actually guaranteed to be unique will contain the same amount of data that you are comparing now. You'll be doing the same amount of work comparing them. – Blorgbeard Feb 08 '19 at 01:43
  • MD5 is not worth exploring. MD5 is designed to avoid collisions in the face of hostile input. This makes MD5 very slow compared to hash functions designed for non-hostile input. Note that you also should not use MD5 to avoid collissions in the face of hostile input; people have figured out how to create collisions. Similarly, `BinaryFormatter` is a generally an inefficient way to implement Equality, for the same reason that it's cheaper to compare two ints than it is to `ToString` those ints and compare the strings. – Brian Feb 08 '19 at 14:03

1 Answers1

7

Since you've already overridden Equals, you are required to also overload GetHashCode. Remember, the fundamental rule of GetHashCode is equal objects have equal hashes.

Therefore, you have overridden GetHashCode.

Since equal objects are required to have equal hash codes, you can implement Equals as:

public static bool Equals(M a, M b)
{
  if (object.ReferenceEquals(a, b)) return true;
  // If both of them are null, we're done, but maybe one is.
  if (object.ReferenceEquals(null, a)) return false;
  if (object.ReferenceEquals(null, b)) return false;
  // Both are not null.
  if (a.GetHashCode() != b.GetHashCode()) return false;
  if (!object.Equals(a.x, b.x)) return false;
  if (!object.Equals(a.y, b.y)) return false;
  return true;
}

And now you can implement as many instance versions of Equals as you like by calling the static helper. Also overload == and != while you're at it.

That implementation takes as many early outs as possible. Of course, the worst-performing case is the case where we have value equality but not reference equality, but that's also the rarest case! In practice, most objects are unequal to each other, and most objects that are equal to each other are reference equal. In those 99% cases we get the right answer in four or fewer highly efficient comparisons.

If you are in a scenario where it is extremely common for there to be objects that are value equal but not reference equal, then solve the problem in the factory; memoize the factory!

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • thx! isn't the whole point of defining `Equals` to create situation where "we have value equality but not reference equality" ? why is that rare ? – kofifus Feb 08 '19 at 01:46
  • I've always considered it more accurate to think of it as *unequal objects have unequal hashes* since it stops people from falling into the trap of *equal hashes means equal values* which is not guaranteed to be true. – Corey Feb 08 '19 at 01:48
  • 2
    @kofifus: The rare situation is that `a` and `b` are value equal but not reference equal. Here, pick any two objects in the universe; odds are good that they are reference-unequal and hash-unequal, so we get to that case first. Now suppose they are equal. How likely is it that two objects are equal, but *not* reference equal? The vast majority of the time that I have objects with value equality, they are also reference equal *because I don't go around making a lot of redundant copies of the same information*. – Eric Lippert Feb 08 '19 at 01:50
  • 1
    @Corey: Your statement is dangerously false and you should stop believing it right now. It is perfectly legal for unequal objects to have unequal hashes! – Eric Lippert Feb 08 '19 at 01:51
  • @EricLippert Sorry, wrong way around. How about *unequal hashes means unequal values*? Didn't check the logic as I was typing. – Corey Feb 08 '19 at 01:52
  • 2
    @Corey: your theory is that it is "more accurate" to think of it in the way that you get wrong while typing up attempts at correcting experts; you *might* want to revisit that theory. **It is more accurate to think of it as "equal objects must have equal hash codes".** As you've just demonstrated, even smart people find it difficult to correctly reason about *multiple opposites in one sentence*. If you possibly can, reason about *equality*, not *un-equality*; you'll be more likely to get it right! – Eric Lippert Feb 08 '19 at 01:56
  • 1
    And yet it's a common mistake for people to think that equal hashes means equal values. If we approach it from *different hash = different value* then the logical implication *same value = same hash* is fairly clear. Is it not more useful then - and arguably more accurate - to think *different hash = different value*? – Corey Feb 08 '19 at 02:00
  • PS, VS19 recommends changing `if (object.ReferenceEquals(null, a)) return false;` to `if (a == null) return false;` maybe `if (a is null)` is even better ? – kofifus Feb 08 '19 at 02:04
  • 1
    @kofifus: Read my answer again very carefully, and *think about all the advice in it*. Now, once you have done so: **why is it completely wrong to replace `ReferenceEquals(null, a)` with `a == null` in this scenario**? And why is it not wrong to use `is null`? (in versions of C# which support it) – Eric Lippert Feb 08 '19 at 02:35
  • @Corey: Please don't use `=` as shorthand for logical implication, as some people may read it as "equals," which is a symmetric operation. I normally favor `=>`, but admit that runs the risk of being interpreted as a lambda. → avoids both of these concerns, at the risk of not rendering on some browsers (and it's harder to type). – Brian Feb 08 '19 at 14:10
  • @kofifus: Exactly! My suggestion is that you override == when you override Equals, and that you make == call Equals, which then means that Equals should not use ==, to avoid recursion. But more generally, when I write ReferenceEquals, it's because I want the reader of the code to 100% guaranteed understand that I am doing a reference equal. "is null" is even better. – Eric Lippert Feb 08 '19 at 14:50
  • so you mean that VS19 proposed change here is buggy ? should we inform the VS team somehow ? – kofifus Feb 09 '19 at 01:22
  • and also Eric, can you see the deeper issue here ? experienced programmers struggling with subtle bug-prone nuances on a basic thing like equality tests ... even c++ was easier with this – kofifus Feb 09 '19 at 01:26
  • Can I see the deeper issue about equality being hard to implement in C#? **Yes I can**. I have written about it frequently over the last fifteen years; it's one of my favourite complaints about C#. I don't have time to give you a dozen links right now, but if you do a web search for "eric lippert C# equality", or do a similar search on this site, you will find *many* examples. – Eric Lippert Feb 09 '19 at 01:42
  • I tried to capture this 'pattern' in an extenstion method here - https://stackoverflow.com/questions/54877077/equality-and-polymorphism – kofifus Feb 27 '19 at 03:48