1

Consider the following classes:

public class Foo
{
    public int Id { get; set; }

    public string Type { get; set; }

    public int BarId { get; set; }
}

public class Bar
{
    public int Id { get; set; }

    public string Name { get; set; }
}

and the following DbContext:

public class TestDbContext : DbContext
{
    public DbSet<Foo> Foos { get; set; }

    public DbSet<Bar> Bars { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlServer(@"Server=.;Database=ConditionalJoinEFCoreTest;Trusted_Connection=True;");
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 1, BarId = 1, Type = "Bar" });
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 2, BarId = 2, Type = "Bar" });
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 3, BarId = 1, Type = "Not Bar" });
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 4, BarId = 2, Type = "Not Bar" });

        modelBuilder.Entity<Bar>().HasData(new Bar { Id = 1, Name = "Bar 1" });
        modelBuilder.Entity<Bar>().HasData(new Bar { Id = 2, Name = "Bar 2" });
    }
}

Now let's query the data:

using (var ctx = new TestDbContext())
{
    var joinResult = ctx.Foos.GroupJoin(
      ctx.Bars,
      foo => new { Key = foo.BarId, PropName = foo.Type },
      bar => new { Key = bar.Id, PropName = "Bar" },
      (foo, bars) => new
      {
         Foo = foo,
         Bars = bars
      })
  .SelectMany(
      x => x.Bars.DefaultIfEmpty(),
      (foo, bar) => new 
      {
          Foo = foo.Foo,
          Bar = bar.Name
      });

    var result = joinResult.GroupBy(x => x.Foo.Id).Select(x => new
    {
        Id = x.Key,
        Name = x.Max(r => r.Bar)
    }).ToList();
}

This query will, as expected, yield the following SQL:

SELECT [foo].[Id], [foo].[BarId], [foo].[Type], [bar].[Name] AS [Bar]
FROM [Foos] AS [foo]
LEFT JOIN [Bars] AS [bar] ON ([foo].[BarId] = [bar].[Id]) AND ([foo].[Type] = N'Bar')
ORDER BY [foo].[Id]

However if we define the type:

public class ConditionalJoin
{
    public int Key { get; set; }

    public string PropName { get; set; }
}

...and then modify the LINQ query:

using (var ctx = new TestDbContext())
{
    var joinResult = ctx.Foos.GroupJoin(
      ctx.Bars,
      foo => new ConditionalJoin { Key = foo.BarId, PropName = foo.Type }, // <-- changed
      bar => new ConditionalJoin { Key = bar.Id, PropName = "Bar" }, // <-- changed
      (foo, bars) => new
      {
         Foo = foo,
         Bars = bars
      })
  .SelectMany(
      x => x.Bars.DefaultIfEmpty(),
      (foo, bar) => new 
      {
          Foo = foo.Foo,
          Bar = bar.Name
      });

    var result = joinResult.GroupBy(x => x.Foo.Id).Select(x => new
    {
        Id = x.Key,
        Name = x.Max(r => r.Bar)
    }).ToList();
}

Then the produced SQL looks like:

SELECT [foo0].[Id], [foo0].[BarId], [foo0].[Type]
FROM [Foos] AS [foo0]

SELECT [bar0].[Id], [bar0].[Name]
FROM [Bars] AS [bar0]

Why does this happen?

Dejan Janjušević
  • 3,181
  • 4
  • 41
  • 67
  • 1
    Because the part `new ConditionalJoin` triggers client-side evaluation (you should see warnings on it in the logging). And that, I assume, is because equality of a named type is different than that of an anonymous type. Two `ConditionalJoin` instances having identical property values aren't equal. Two anonymous objects having the same structure and values are. I'd be surprised if both queries yield the same result. – Gert Arnold Jul 07 '19 at 20:57
  • 1
    In a quick test with EF6 and EF core it turns out that EF6 just handles the named type as anonymous type and thus, in fact, disregards the C# equality rules. That may be a more welcome implementation to developers (the query works!) but IMO it's incorrect. EF core is more strict in this respect which I think is better. – Gert Arnold Jul 07 '19 at 21:11
  • @GertArnold thanks for your always useful comments. Is there any way to avoid this problem when generating GroupJoin expressions dynamically? Because when using expression trees, anonymous types can't be used. The only way is to generate a needed type dynamically and then use it. But then we will have a named type just like any other, right? And therefore it will probably also trigger the client-side evaluation of the join, like in the above case. – Dejan Janjušević Jul 07 '19 at 21:15
  • If you want you can find some inspiration [here](https://stackoverflow.com/q/389094/861716). But maybe you should zoom out a bit and reconsider the approach. What you have here is in fact a fixed join and a (possibly) dynamic where clause, `Where(foo => foo.Type == "Bar")`, which is much easier to build at runtime. – Gert Arnold Jul 07 '19 at 21:48
  • @GertArnold not sure if I can zoom out at all. This example is very simplified and shows only one conditional join. In reality, I expect there to be multiple, i.e. one field might contain a foreign key to one of the many tables. There are even likely to be multiple fields containing foreign keys to multiple tables. I think the best approach performance-wise would be to perform the conditional join. I will take a look at the question you linked to see if I can make something out of it. – Dejan Janjušević Jul 08 '19 at 06:20
  • @GertArnold ...although, there is only a small set of columns which could contain the foreign key value (based on the FK type) and only a single column which contains what is `Type` here, so I think I can zoom out a little bit after all! Thanks! Can you make your comments regarding the type reference vs value equality an answer, so I can mark it accepted? – Dejan Janjušević Jul 08 '19 at 08:43

1 Answers1

3

Update (EF Core 3.x+): The issue has been fixed.

Original:

As I was suspecting, the problem is not the anonymous vs concrete type, but the unique C# compiler feature which emits special Expression.New call rather than the normal for such syntax Expression.MemberInit, and this is done only for anonymous types. It's exactly the same problem as in Selection in GroupBy query with NHibernate with dynamic anonymous object, so is the solution - generate class constructor with parameters and generate NewExpression with mapping arguments to class members.

Here is the prove of concept with the static class in question:

public class ConditionalJoin
{
    public ConditionalJoin(int key, string property)
    {
        Key = key;
        Property = property;
    }
    public int Key { get; }
    public string Property { get; }
    public static Expression<Func<T, ConditionalJoin>> Select<T>(Expression<Func<T, int>> key, Expression<Func<T, string>> property)
    {
        var parameter = key.Parameters[0];
        var body = Expression.New(
            typeof(ConditionalJoin).GetConstructor(new[] { typeof(int), typeof(string) }),
            new[] { key.Body, Expression.Invoke(property, parameter) },
            new [] { typeof(ConditionalJoin).GetProperty("Key"), typeof(ConditionalJoin).GetProperty("Property") });
        return Expression.Lambda<Func<T, ConditionalJoin>>(body, parameter);
    }
}

and the usage:

var joinResult = ctx.Foos.GroupJoin(
    ctx.Bars,
    ConditionalJoin.Select<Foo>(foo => foo.BarId, foo => foo.Type),
    ConditionalJoin.Select<Bar>(bar => bar.Id, bar => "Bar"),
    // the rest...

Of course if you want the query to work correctly event if evaluated client side (e.g. LINQ to Objects), the class must correct implement GetHashCode and Equals.


With that being said, actually EF Core supports another simpler alternative solution - using Tuples (not ValueTuples - these are still not supported in expression trees) instead on anonymous/concrete types. So the following also works correctly in EF Core (lloks like they have special case for Tuple types):

var joinResult = ctx.Foos.GroupJoin(
    ctx.Bars,
    foo => new Tuple<int, string>(foo.BarId, foo.Type),
    bar => new Tuple<int, string>(bar.Id, "Bar"),
    // the rest...
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • It works! It made sense what @GertArnold initially said, that it was the equality issue. However that would mean that implementing `IEquatable` would solve that problem, but it made no difference. This one solves the mystery however. Great research, thanks! – Dejan Janjušević Jul 08 '19 at 10:44
  • What @Gert said makes sense only for client side evaluation. Since EF6 is always using server side evaluation, it makes sense to translate it to SQL the way it does. For EF Core with client evaluation it is debatable, but since EF Core 3.0 is going to remove the client evaluation and will rewrite it's query translation engine, I'm expecting this issue to disappear at some point. Anyway, I've added an update for the current EF Core which works well in both cases and does not require generation of dynamic classes or special expressions. – Ivan Stoev Jul 08 '19 at 11:04
  • 1
    Excellent @Ivan! I was hoping you'd chime in because I felt it could be done. But so far I've only ventured knee-deep into this expression stuff. You swim in it! – Gert Arnold Jul 08 '19 at 12:30
  • 1
    @IvanStoev just a FYI, it looks like .EF Core supports `Expression.MemberInit` now (tested with v3.1) - it translates it to SQL correctly. – Dejan Janjušević Jun 10 '20 at 12:31
  • @DejanJanjušević I know, thanks. But they are changing the things too oftet, fixing this, breaking that etc., I can't go back and update all workarounds I've posted at specific time. – Ivan Stoev Jun 10 '20 at 14:01
  • 1
    @IvanStoev yeah no problem, it was not why I pinged you about it, just wanted to share the information with you. But figures you already knew that :) – Dejan Janjušević Jun 10 '20 at 17:27