0

I have encountered some odd behaviour while implementing a Group Join with a customer IEqualityComparer.

The following code demonstrates the behaviour that is the problem for me

List<String> inner = new List<string>() { "i1", "i2" };
List<String> outer = new List<string>() { "o1", "o2" };

var grouped = outer.GroupJoin(inner, i => i, o=> o, (inKey, outCollection) => new {Key = inKey, List = outCollection},
        new EqualityComparer<string>((i, o) => i == o)).ToList();

From the docs found on MSDN I would expect that the last parameter to be passed a series of inner keys and outer keys for comparison.

However, placing a breakpoint inside the Func shows that both i and o start with the letter i and are in fact both elements of the inner collection so the grouped object is always empty (I know the example will always be empty, its just the smallest bit of code that that demonstrates the problem).

Is there a way to GroupJoin objects with a custom comparator?

For completeness, this the EqualityComparer that is being created in the GroupJoin argument list:

public class EqualityComparer<T> : IEqualityComparer<T>
{
    public EqualityComparer(Func<T, T, bool> cmp)
    {
        this.cmp = cmp;
    }
    public bool Equals(T x, T y)
    {
        return cmp(x, y);
    }

    public int GetHashCode(T obj)
    {
        // Always return 0 so that the function is called
        return 0;
    }

    public Func<T, T, bool> cmp { get; set; }
}
CurlyPaul
  • 1,138
  • 1
  • 10
  • 29
  • Your doc link is to Queryable.GroupJoin, but you're calling Enumerable.GroupJoin. – Jon Skeet Nov 04 '15 at 12:39
  • And it makes things *really* confusing that you're using `outer` and `inner` in the opposite sense to the parameter names. Seehttps://msdn.microsoft.com/en-us/library/vstudio/bb535047(v=vs.100).aspx – Jon Skeet Nov 04 '15 at 12:43
  • @Jon - Thanks! I switched the names around to make it clearer and looked at the correct doc page, but I don't see any difference between the IQueryable version and the IEnumerable. In the real version of the code its being called on an IQueryable with the same results – CurlyPaul Nov 04 '15 at 12:57
  • If it's being called on an `IQueryable`, that's interesting because the query provider *might* be able to recognise the custom comparer and convert that to another representation (e.g. SQL)... or it might not (involving bringing all the data locally). – Jon Skeet Nov 04 '15 at 14:03

1 Answers1

2

A GroupJoin operation first needs to build a lookup - basically from each projected key in inner to the elements of inner with that key. That's why you're being passed inner values. This happens lazily in terms of "when the first result is requested" but it will consume the whole of inner at this point.

Then, once the lookup has been built, outer is streamed, one element at a time. At this point, your custom equality comparer should be asked to compare inner keys with outer keys. And indeed, when I add logging to your comparer (which I've renamed to avoid collisions with the framework EqualityComparer<T> type) I see that:

using System;
using System.Linq;
using System.Collections.Generic;

public class Test
{
    public static void Main()
    {
        List<String> inner = new List<string>() { "i1", "i2" };
        List<String> outer = new List<string>() { "o1", "o2" };

        outer.GroupJoin(inner, i => i, o=> o,
            (inKey, outCollection) => new {Key = inKey, List = outCollection},
            new CustomEqualityComparer<string>((i, o) => i == o)).ToList();
    }
}

public class CustomEqualityComparer<T> : IEqualityComparer<T>
{
    public CustomEqualityComparer(Func<T, T, bool> cmp)
    {
        this.cmp = cmp;
    }
    public bool Equals(T x, T y)
    {
        Console.WriteLine("Comparing {0} and {1}", x, y);
        return cmp(x, y);
    }

    public int GetHashCode(T obj)
    {
        // Always return 0 so that the function is called
        return 0;
    }

    public Func<T, T, bool> cmp { get; set; }
}

Output:

Comparing i1 and i2
Comparing i1 and i2
Comparing i1 and i2
Comparing i2 and o1
Comparing i1 and o1
Comparing i2 and o2
Comparing i1 and o2

Now that's not the only possible implementation of GroupJoin, but it's a fairly obvious one. See my Edulinq post on GroupJoin for more details.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Marvellous! So the comparer is first used to check that the inner elements are unique while adding them to the look up table, and then goes on to sort the outer elements into the groups. That explains what I saw in this test, but means I have a bug in my real implementation. Thank you for your help here – CurlyPaul Nov 04 '15 at 14:44
  • @CurlyPaul: It doesn't check that the inner elements have unique keys - they may well not do. The point is to create a list of inner elements matching each key. After all, that's the *group* part :) – Jon Skeet Nov 04 '15 at 14:45
  • Right, got it. I think I need another solution as this isn't going to work for me, but I now have a much better understanding of how this works – CurlyPaul Nov 04 '15 at 15:32