6

Attempt #3 to simplify this question:

A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.

Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.

What I'm hearing from @Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().

Questions:

  1. Am I understanding that correctly?
  2. Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
  3. What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
  4. What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...

In general, how does one separate value equality from reference equality when overriding .Equals()?



The original line of code spurring this question:

//  For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;


StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));

.

Community
  • 1
  • 1
James King
  • 6,233
  • 5
  • 42
  • 63
  • 1
    Your question is really hard to understand. Are you planning to inherit from `DataRow` and override the equality methods? Or do you plan to create a custom `IEqualityComparer`? Where are you changing the data in the `DataRow`? Where is the code that needs to to compare `DataRows` based on values? – Yacoub Massad May 15 '16 at 21:40
  • 2
    It really just sounds like you need to create your own `IEqualityComparer` instance and use that in your queries. – Enigmativity May 16 '16 at 01:07
  • Yeah, I know, but I can't win... if I try to simplify questions, I get slammed for not having enough info. But I would have preferred to skip everything in the middle and just ask the question. I'll try to summarize it up front. – James King May 16 '16 at 03:05
  • @YacoubMassad, I did my best to summarize the question up front. If it's still not clear, ask away. Thanks for looking at it. – James King May 16 '16 at 03:36
  • It's still pretty unclear what you're asking here. I think @Enigmativity is probably correct - you don't need to (or want to) override Equals here. As Yacoub asked, where is the code that compares these? It doesn't seem any of the code you've included is relevant to your problem. A [mcve] would be useful. – Charles Mager May 16 '16 at 07:05
  • I'm not sure how I would use a custom `IEqualityComparer` in a linq query. The code that compares these would just be an `if(row.equals(otherrow))`. The issue is the `.Where(match => !DatafeedRow.StagingTableMatches.Contains(match))`. If linq would use the custom `IEqualityComparer` for that, then I could override `.Equals()`. But that seems to be asking for trouble - I'd have to guarantee that any code written against those DataRows would use my custom comparer. – James King May 16 '16 at 13:30
  • I did update the question to make it clearer I'm working with a typed DataRow - i.e. yes, I'm inheriting from `DataRow`. – James King May 16 '16 at 13:31
  • @JamesKing, if there are multiple ways of comparing two data rows, then you cannot put these two ways into the same `DataRowSubClass` class. You have to put them into two different `IEqualityComparer`s. Or you can put one way of comparing into `DataRowSubClass` and another one into its own `IEqualityComparer`. It is a rule that `GetHashCode` should return the same value for two objects that `Equals` return true for. – Yacoub Massad May 16 '16 at 16:49
  • I have completely rewritten this question... I think I understand what @Enigmativity is driving at, but need to be sure. – James King May 16 '16 at 18:15
  • 3
    Why not use the `Enumerable.Contains(` that [takes in a equality comparer](https://msdn.microsoft.com/en-us/library/bb339118(v=vs.100).aspx), then it does not matter what the default comparer is. – Scott Chamberlain May 16 '16 at 18:17
  • I added the line of code that started this question (the original version had a very long explanation of what I need to do and why). The short answer (if I understand your question) is that doing so would break encapsulation. If everyone needs to know (and remember) to always use a custom equality comparer, and that custom comparer is already defined/provided/not interchangeable, it should just be part of my object, true? – James King May 16 '16 at 18:28
  • @JamesKing to your last point; if that definition of "equality" should be the default, then override `Equals` and `GetHashCode` and implement `IEquatable` , not `IEqualityComparer`. – D Stanley May 16 '16 at 18:41
  • @JamesKing - With regard to your "Attempt #3" there is one glaring issues that I want to point out. You **cannot** override `.Equals` **without** also overriding `.GetHashCode` - if you don't you **will break equality**. However, the value of `GetHashCode` mustn't change throughout the usage of the object if it is used in any structure or query that calls `.GetHashCode`. Therefore, you can only safely use **immutable** values to compute the hash code. So your **only safe choice** is to implement your own `IEqualityComparer` to handle this situation. – Enigmativity May 16 '16 at 23:13
  • As an aside, it looks like `StagingTableMatches` should be a `HashSet` rather than a list. You can specify an equality comparer if required. – Charles Mager May 17 '16 at 06:31

2 Answers2

4

Ok, let's handle a few misconceptions first:

By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.

This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.

I can't think of a single case where that would be desirable for a reference type.

Oh I'm sure you can... String is a reference type for instance :)

What I'm hearing from @Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().

Err... No.

IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:

  • It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
  • It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)

So, for your actual questions:

Am I understanding that correctly?

You still seem a bit confused about this, but don't worry :)

Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.

Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()

By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.

What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()

Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).

It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.

What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...

Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.


Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.

Community
  • 1
  • 1
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • Thanks for the further clarification... I get what Enigmativity was saying now, that `IEqualityComparer` belongs to a separate class (and is therefore not the tool I'm looking for). I'm not worried about value types... in fact, this issue is specific to reference types (string doesn't count, I can't inherit from it :) ). But even if I implement/override `IEquatable` in my class, I can only pick one "equals" - value or reference. The assumption that these are interchangeable seems like a flawed design to me. – James King May 16 '16 at 19:09
  • You have to decide which *semantics* to give to your reference type. I agree that most of the time you should use reference equality semantics (principle of least astonishment), but for other cases value type semantics may make more sense (as for `string` for instance). Remember that you can *always* force reference semantics by explicitly using `ReferenceEquals`, and if you need to handle several different equality comparison types (such as canse sensitive/insensitive string comparison), you'll have to use an `IEqualityComparer` so you can tell which one you want. – Lucas Trzesniewski May 16 '16 at 19:17
  • Yeah, where I'm coming down in all this is that I have to leave the default `.Equals()` in place because the framework expects reference equality by default. Therefore my value equality tests have to be in a separate method. Which works, just... poop. Thanks. – James King May 16 '16 at 19:34
  • The framework makes no assumptions whatsoever from any custom type. You don't *have* to leave reference equality, there's no such guideline. If the rest of *your* code assumes so, then it's a different matter of course, but the framework just... doesn't care. – Lucas Trzesniewski May 16 '16 at 19:40
2

Am I understanding that correctly?

Partially - the "default" IEqualityComparer will use either (in order):

  1. The implementation of IEquatable<T>
  2. An overridden Equals(object)
  3. the base object.Equals(object), which is reference equality for reference types.

I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.

The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.

So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.

If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.

Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?

No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.

What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?

It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.

It would be strange to have an instance method accept itself as a parameter...

Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>

Bottom Line

If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.

Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Got it. Or at least half of it. I feel like I'm futilely flopping around trying to explain the issue, which is really just that overriding `.Equals()` to give me value equality breaks everything that assumes `.Equals()` means the same object. Two instances of a class can have value equality and be in the same `List`, but `.Contains()` should not consider them as equal. I implement `IEquatable`, but I don't want `IEquatable.Equals()` to be used for lookups that should be matching the instances themselves - not their values. – James King May 16 '16 at 18:42
  • "I don't want `IEquatable.Equals()` to be used for lookups that should be matching the instances themselves" - Then don't override `Equals` or implement `IEquatable`. Let reference equality be the default, and create a class that implements `IEqualityComparer` and pass that in when you want to consider value equality. – D Stanley May 16 '16 at 18:43
  • Then everyone who uses my class needs to know they must use my custom comparer everywhere comparisons are required, and if they forget, will have a subtle and intermittent bug to pin down. I'm deliberately building unintended side effects into the design. – James King May 16 '16 at 18:52
  • But you just said you don't want to change the default since it will break existing code! One has to be the default - you can override `Equals` to be value equality, and have a custom comparer that goes back to reference equality, or you can leave reference equality as the default and have a custom comparer that uses value equality. There's no way to have both as the "default". – D Stanley May 16 '16 at 18:59
  • I get what you're saying, and probably 12 hours ago I was ready to say 'screw it' and just implement an 'Equals' function with a different name for value equality, but I was really hoping there was a cleaner solution... I can't think of a single case where I would want to override `.Equals()` and want to use value equality to retrieve different instances from a collection. I have to be missing something major - this can't be an uncommon scenario. – James King May 16 '16 at 19:01
  • @JamesKing That's where I'm confused - in your original question you say you _do_ want to compare objects by their values, so apparently you _do_ have a valid use case. The bottom line is you can either change the default or implement it in a separate class. Pick your poison... – D Stanley May 16 '16 at 19:14
  • Yeah, that's what I'm getting - one or the other :( It's hard to be clear without a code dump... When I'm adding objects to a collection, I'm comparing by matching values (user records in more than one datafeed). But I have three unique identifiers, none of which are required, and all of which can change during a user's employment... so I have to say "add to collection where the row doesn't already exist" (because the same row can match on more than one ID). I'm basically in a straight jacket with one foot stapled to the back of my head. : | Can repost a bigger code snippet if you want. – James King May 16 '16 at 19:28
  • It really sounds like you just want a custom `IEqualityComparer` for that scenario - I'm not sure what the the concern is. If you want to make it more obvious to possible users you can make it a _nested_ class so it's inseparable from the class definition. – D Stanley May 16 '16 at 20:01