5

Today I discovered that Entity Framework was adding an unnecessary sub query to the SQL it generates. I started digging my code trying to narrow down where it might come from. A (long) while later I pin-pointed what's causing it. But now I'm more confused than when I started, as I have no clue why it causes it.

Basically what I discovered is that on certain scenarios, simply converting a constant into a variable can alter the SQL that Entity Framework generates. I've shrunk everything to the bare minimum and packed it in a little console app:

using System;
using System.Data.Entity;
using System.Linq;

class Program
{
    private static readonly BlogContext _db = new BlogContext();

    static void Main(string[] args)
    {
        const string email = "foo@bar.com";

        var comments = from c in _db.Comments
                       where c.Email == email
                       select c;

        var result = (from p in _db.Posts
                      join c in comments on p.PostId equals c.PostId
                      orderby p.Title
                      select new { p.Title, c.Content });

        Console.WriteLine(result);
    }
}

public class BlogContext : DbContext
{
    public DbSet<Post> Posts { get; set; }
    public DbSet<Comment> Comments { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
}

public class Comment
{
    public int CommentId { get; set; }
    public int PostId { get; set; }
    public string Email { get; set; }
    public string Content { get; set; }
}

This shows the following output, which is perfect:

SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM  [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE N'foo@bar.com' = [Extent2].[Email]
ORDER BY [Extent1].[Title] ASC

Now if I make email a variable:

/*const*/ string email = "foo@bar.com";

The output changes radically:

SELECT
[Project1].[PostId] AS [PostId],
[Project1].[Title] AS [Title],
[Project1].[Content] AS [Content]
FROM ( SELECT
        [Extent1].[PostId] AS [PostId],
        [Extent1].[Title] AS [Title],
        [Extent2].[Content] AS [Content]
        FROM  [dbo].[Posts] AS [Extent1]
        INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
        WHERE [Extent2].[Email] = @p__linq__0
)  AS [Project1]
ORDER BY [Project1].[Title] ASC

As a side note, LINQ to SQL does not seem to do this. I know it's probably okay to ignore this, as both commands return the same data. But I'm extremely curious as to why this happens. Up until today I always had the (perhaps false?) impression that it is always safe to turn a constant into a variable, providing that the value remains the same (which in this case does). So I have to ask...

Why a seemingly insignificant change causes such a big difference in the generated SQL?

Update:

Just to be clear, my question isn't about the value of email being a hard-coded value in the first query and a variable in the second (which makes all the sense in the world). My question is about why the variable version results in the extra sub query.

Thanks!

Daniel Liuzzi
  • 16,807
  • 8
  • 52
  • 57
  • I wouldn't worry about the *form* of the query - SQL is meant to be about expressing the results you want, not the process by which to achieve them - and I'd be surprised if there's much or any difference in the query plan. The optimizer should produce much the same query plan. – Damien_The_Unbeliever May 07 '11 at 16:03

4 Answers4

5

The answer is fairly simple. Your LINQ query is expressed with expression trees. The difference with the const variable vs non const one are lies in ConstantExpression and ParameterExpression.

When you use const your LINQ query uses ConstExpression for this variable, and when you use non const it uses ParameterExpression which are interpreted differently by the EF Runtime.

Constant actually means that the value will never change and the value can be inlined into the query.

Ivan Zlatanov
  • 5,146
  • 3
  • 29
  • 45
  • And this reminds me I still have plenty to learn about LINQ in general and expression trees in particular. Thanks a lot for those links, Ivan. – Daniel Liuzzi May 07 '11 at 16:25
  • 1
    This is probably really important. If you say it's a constant, and it's not, then it will generate and cache a brand new query every time in SQL server. On the other hand, if it's a variable, the query itself may be the same despite different values for the variable, so it will only be generated and cached once. So... if THE QUERY will be run with different values over time, then leave it a variable. If it's truly constant within your app, then use a constant. Simple. – Triynko Jul 13 '17 at 13:43
3

NOT an answer to the question - just context on using Parameters.

This is to do with creating a query such that it will re-use existing query plans.

If you inject the variable (as opposed to a reference to a parameter) into the generate SQL, then SQL Server (and probably other database engines) will not be able to re-use the same plan when the variable changes.

For constants this is not an issue, because you know the value is always the same, but for variables each time the query is executed the SQL and thus query plan would be slightly different.

This might not sound like much, but SQL only has a certain amount of space assigned for query plans, so having hundreds/thousands of minor variations in the cache is a real 'waste of space' so to speak!

Alex James
  • 20,874
  • 3
  • 50
  • 49
  • 1
    +1 I think this is the answer to the question. Moreover it looks like very clear explanation to one of the biggest EF mysteries. – Ladislav Mrnka May 07 '11 at 21:52
1

Is this actually a big difference in the SQL? The inner query is the same as the original query, and the outer query is just a wrapper over the inner that doesn't change the result set.

Unless this is causing problems, I personally wouldn't worry about it. Do the query plans differ between the two flavours of query? My guess is that they're identical.

Will A
  • 24,780
  • 5
  • 50
  • 61
  • Thanks Will! Well, in this context the difference is probably negligible. But the actual code I'm working on is the data access layer of a *much* larger application, that swings IQueryables all over the place. I'm just afraid that if a little change like this results in an extra query being generated, then maybe larger changes will eventually result in some nasty T-SQL mess in the future. The main reason I care about this is because I love using LINQ and plan to keep using it exclusively (without having to resort to sprocs when things get messy.) – Daniel Liuzzi May 07 '11 at 16:12
  • 1
    @Daniel - you're not actually getting an extra query here, though - you still have just _one_ query - albeit a slightly more complex looking query. SQL Server will have no problems optimizing away the outer part of the query - honest! – Will A May 07 '11 at 16:17
  • Yes, I know the DB is hit just once, and yes, I really want to believe SQL Server is smart enough to figure the outer query is redundant. I just couldn't help myself and needed to ask and make sure :) – Daniel Liuzzi May 07 '11 at 16:29
  • Could have a performance impact. I just ran into an issue with speed on a fairly long query. Changing a `string` to a `const string` actually resulted in a 10x faster query. Only one query was generated in either instance. – General Grievance Feb 16 '22 at 17:08
1

Like people said. Difference between both queries is minimal.

The reason is that the expression, that is created when you create your LINQ is different when you use variable and when constant. And EF will catch this and will generate your SQL acordingly. It knows it will never change, so it can be hard-coded into the query for (possible) performance gain.

Edit: I dont think there is answer to this question except "Thats how EF does it." But it is very well known that EF likes to creates many subselects. It can result for many subselects for more complex queries. Some even dismis even using EF for this fact. But this is simply price for using tool like EF. You loose fine-grained control over something, that can have big performance gain. Why do you use .NET, when you can use C and gain more performance? Why use C when you can use assembly, to gain more performance gain?

Only way to be safe and still be able to use high-abstraction layer EF is to use SQL profiller often and check if there are not queries that take too long on real data. And if your find some then either conver them to direct SQL or stored procedures.

Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • Thanks Euphoric. Yes, I noticed the value is hard-coded when using the constant. This I understand and it makes perfect sense. My question was about the sub query though. – Daniel Liuzzi May 07 '11 at 16:16