Strange behavior of EqualityComparer with nullable fields

Question

Assume there is this class:

public class Foo
{
    public int Id { get; set; }
    public int? NullableId { get; set; }

    public Foo(int id, int? nullableId)
    {
        Id = id;
        NullableId = nullableId;
    }
}

I need to compare these objects by following rules:

If both objects have value for NullableId then we compare both Id and NullableId
If some of the objects/both of them do not have NullableId then ignore it and compare only Id.

To achieve it I have overwritten Equals and GetHashCode like this:

public override bool Equals(object obj)
{
    var otherFoo = (Foo)obj;

    var equalityCondition = Id == otherFoo.Id;

    if (NullableId.HasValue && otherFoo.NullableId.HasValue)
        equalityCondition &= (NullableId== otherFoo.NullableId);

    return equalityCondition;
}

public override int GetHashCode()
{
    var hashCode = 806340729;
    hashCode = hashCode * -1521134295 + Id.GetHashCode();
    return hashCode;
}

Further down I have two lists of Foo:

var first = new List<Foo> { new Foo(1, null) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };

Next, I want to join these lists. If I do it like this:

var result = second.Join(first, s => s, f => f, (f, s) => new {f, s}).ToList();

then the result would be as I expected and I will get 3 items. But, if I change order and join first with second:

var result = first.Join(second, f => f, s => s, (f, s) => new {f, s}).ToList();

then the result would only have 1 item - new Foo(1, null) and new Foo(1 ,3)

I can not get what am I doing wrong. If try to put a break point in Equals method then I can see that it tries to compare items from same list (e. g. compare new Foo(1, 1) and new Foo(1 ,2)). For me it looks like that happens because of Lookup that is being created inside Join method.

Could someone clarify what happens there? What should I change to achieve desired behavior?

Weeble · Accepted Answer · 2020-03-15T23:47:52.757

Your Equals method is reflexive and symmetric, but it is not transitive.

Your implementation doesn't meet the requirements specified in the docs:

If (x.Equals(y) && y.Equals(z)) returns true, then x.Equals(z) returns true.

from https://learn.microsoft.com/en-us/dotnet/api/system.object.equals?view=netframework-4.8

For example, suppose you have:

var x = new Foo(1, 100);
var y = new Foo(1, null);
var z = new Foo(1, 200);

You have x.Equals(y) and y.Equals(z) which implies that you should also have x.Equals(z), but your implementation does not do this. Since you don't meet the specification, you can't expect any algorithms reliant on your Equals method to behave correctly.

You ask what you can do instead. This depends on exactly what you need to do. Part of the problem is that it's not really clear what is intended in the corner-cases, if indeed they can appear. What should happen if one Id appears multiple times with the same NullableId in one or both lists? For a simple example, if new Foo(1, 1) exists in the first list three times, and the second list three times, what should be in the output? Nine items, one for each pairing?

Here's a naive attempt to solve your problem. This joins on only Id and then filters out any pairings that have incompatible NullableId. But you might not be expecting the duplicates when an Id appears multiple times in each list, as can be seen in the example output.

using System;
using System.Linq;
using System.Collections.Generic;

public class Foo
{
    public int Id { get; set; }
    public int? NullableId { get; set; }

    public Foo(int id, int? nullableId)
    {
        Id = id;
        NullableId = nullableId;
    }

    public override string ToString() => $"Foo({Id}, {NullableId?.ToString()??"null"})";
}

class MainClass {
  public static IEnumerable<Foo> JoinFoos(IEnumerable<Foo> first, IEnumerable<Foo> second) {
    return first
        .Join(second, f=>f.Id, s=>s.Id, (f,s) => new {f,s})
        .Where(fs =>
            fs.f.NullableId == null ||
            fs.s.NullableId == null ||
            fs.f.NullableId == fs.s.NullableId)
        .Select(fs => new Foo(fs.f.Id, fs.f.NullableId ?? fs.s.NullableId));    
  }
  public static void Main (string[] args) {
    var first = new List<Foo> { new Foo(1, null), new Foo(1, null), new Foo(1, 3) };
    var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3), new Foo(1, null) };
    foreach (var f in JoinFoos(first, second)) {
      Console.WriteLine(f);
    }
  }
}

Output:

Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
Foo(1, null)
Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
Foo(1, null)
Foo(1, 3)
Foo(1, 3)

It also might be too slow for you if you have tens of thousands of items with the same Id, because it builds up every possible pair with matching Id before filtering them out. If each list has 10,000 items with Id == 1 then that's 100,000,000 pairs to pick through.

Thanks for reply. This is very good point. Now I see it. Do you have any idea how can I achieve behavior I want? — Yurii, Mar 15 '20 at 19:10
Many thanks for help. I will go with solution you suggested. — Yurii, Mar 17 '20 at 08:58

V. S. · Answer 2 · 2020-05-02T16:31:21.727

My answer contains a program that I believe is better than the one proposed in Weeble's answer but first I would like to demonstrate how the Join method works and talk about problems I see in your approach.

As you can see here https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.join?view=netframework-4.8 the Join method

Correlates the elements of two sequences based on matching keys.

If the keys don't match then elements from both collections are not included. For example, remove your Equals and GetHashCode methods and try this code:

  var first = new List<Foo> { new Foo(1, 1) };
  var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };

  //This is your original code that returns no results
  var result = second.Join(first, s => s, f => f, (f, s) => new { f, s }).ToList();
  result = first.Join(second, s => s, f => f, (f, s) => new { f, s }).ToList();

  //This code is mine and it returns in both calls of the Join method one element in the resulting collection; the element contains two instances of Foo (1,1) - f and s 
  result = second.Join(first, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
  result = first.Join(second, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();

But if you set your original data input that contains null with my code:

  var first = new List<Foo> { new Foo(1, null) };
  var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };

  var result = second.Join(first, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
  result = first.Join(second, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();

the result variable will be empty in both cases since the key { 1, null } doesn't match any other key, i.e. { 1, 1 }, { 1, 2 }, { 1, 3 }.

Now returning to your question. I would suggest you reconsider your entire approach in cases like this and here is why. Let us imagine that your implementation of the Equals and GetHashCode methods worked as you expected and you even didn't post your question. Then your solution creates the following outcomes, as I see it:

To understand how your code calculates its output the user of your code has to have access to the code of the Foo type and spend time reviewing your implementation of the Equals and GetHashCode methods (or reading documentation).
With such implementation of the Equals and GetHashCode methods, you are trying to change the expected behavior of the Join method. The user may expect that the first element of the first collection Foo(1, null) will not be considered equal to the first element of the second collection Foo(1, 1).
Let us imagine that you have multiple classes to join, each is written by some individual, and each class has its own logic in the Equals and GetHashCode methods. To figure out how actually your joining works with each type the user instead of looking into a joining method implementation only once would need to check the source code of all those classes trying to understand how each type handles its own comparison facing different variations of things like this with magic numbers (taken from your code):
```
 public override int GetHashCode()
 {
     var hashCode = 806340729;
     hashCode = hashCode * -1521134295 + Id.GetHashCode();
     return hashCode;
 }
```
It may don't seem a big problem but imagine you are a new person on the the project, you have a lot of classes with logic like this and limited time to complete your task, e.g. you have an urgent change request, huge sets of data input, and no unit tests.

If someone inherites from your class Foo and put an instance of Foo1 to the collection among with Foo instances:

public class Foo1 : Foo
{
    public Foo1(int id, int? nullableId) : base (id, nullableId)
    {
        Id = id;
        NullableId = nullableId;
    }
    public override bool Equals(object obj)
    {
        var otherFoo1 = (Foo1)obj;
        return Id == otherFoo1.Id;
    }
    public override int GetHashCode()
    {
        var hashCode = 806340729;
        hashCode = hashCode * -1521134295 + Id.GetHashCode();
        return hashCode;
    }
}

var first = new List<Foo> { new Foo1(1, 1) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3)};

var result = second.Join(first, s => s, f => f, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => s, f => f, (f, s) => new { f, s }).ToList();

then you have here a run-time exception in the Equals method of the type Foo1: System.InvalidCastException, Message=Unable to cast object of type 'ConsoleApp1.Foo' to type 'ConsoleApp1.Foo1'. With the same input data, my code would work fine in this situation:

var result = second.Join(first, s => s.Id, f => f.Id, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => s.Id, f => f.Id, (f, s) => new { f, s }).ToList();

With your implementation of the Equals and GetHashCode methods when someone modifies the joining code like this:

 var result = second.Join(first, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
 result = first.Join(second, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();

then your logic in the Equals and GetHashCode methods will be ignored and you will have a different result.

In my opinion, this approach (with overriding Equals and GetHashCode methods) may be a source of multiple bugs. I think it is better when your code performing joining has an implementation that can be understood without any extra information, the implementation of the logic is concentrated within one method, the implementation is clear, predictable, maintainable, and it is simple to understand.

Please also note that with your input data:

 var first = new List<Foo> { new Foo(1, null) };
 var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };

the code in the Weeble's answer generates the following output:

 Foo(1, 1)
 Foo(1, 2)
 Foo(1, 3)

while as far as I understand you asked for an implementation that with the input produces output that looks like this:

 Foo(1, null), Foo(1, 1)
 Foo(1, null), Foo(1, 2)
 Foo(1, null), Foo(1, 3)

Please consider updating your solution with my code since it produces a result in the format you asked for, my code is easier to understand, and it has other advantages as you can see:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp40
{
    public class Foo
    {
        public int Id { get; set; }
        public int? NullableId { get; set; }

        public Foo(int id, int? nullableId)
        {
            Id = id;
            NullableId = nullableId;
        }

        public override string ToString() => $"Foo({Id}, {NullableId?.ToString() ?? "null"})";
    }

    class Program
    {

        static void Main(string[] args)
        {
            var first = new List<Foo> { new Foo(1, null), new Foo(1, 5), new Foo(2, 3), new Foo(6, 2) };
            var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3), new Foo(2, null) };

            var result = second.Join(first, s=>s.Id, f=>f.Id, (f, s) => new { f, s })
                .Where(o => !((o.f.NullableId != null && o.s.NullableId != null) &&
                     (o.f.NullableId != o.s.NullableId)));

            foreach (var o in result) {  
                Console.WriteLine(o.f + ", " + o.s);
            }

            Console.ReadLine();
        }
    }
}

Output:

Foo(1, 1), Foo(1, null)
Foo(1, 2), Foo(1, null)
Foo(1, 3), Foo(1, null)
Foo(2, null), Foo(2, 3)

Strange behavior of EqualityComparer with nullable fields

2 Answers2