Why does Entity Framework generate slow overengineered SQL?

Question

I have this code:

DbSet<TableName> table = ...// stored reference

var items = from n in table where
            n.Name.ToUpper().Contains(searchString.ToUpper().Trim())
            select n;
WriteToLog( items.ToString() );

The last line outputs the generated SQL. Here's what I get:

SELECT 
    [Extent1].[Name] AS [Name],
    // all the other columns follow
FROM (SELECT 
  [TableName].[Name] AS [Name],
  // all the other columns follow
FROM [dbo].[TableName] AS [TableName]) AS [Extent1]
WHERE ( CAST(CHARINDEX(LTRIM(RTRIM(UPPER(@p__linq__0))), UPPER([Extent1].[Name])) AS int)) > 0

You see, there's SELECT-from-SELECT although it's completely redundant - one SELECT would be just enough. The code using EF runs longer than half a minute and time out on that query although the table is rather small.

Why is this overengineered SQL query generated and how do I make EF generate a better query?

The query optimiser will make short work of that. I would argue that it's generating code that's not human-readable, but that's not a problem, since it's not designed to do that. If you had some measured evidence that the SQL is slow that would be great, but I would imagine it isn't and you're mistaken. Do you? — Kieren Johnstone, Oct 09 '12 at 12:00
As to how you can improve it, see http://weblogs.asp.net/scottgu/archive/2012/07/19/entity-framework-and-open-source.aspx — Ian Mercer, Oct 09 '12 at 12:01
@Kieren Johnstone: Well, the query runs longer than half a minute and times out although it's quite trivial. — sharptooth, Oct 09 '12 at 12:03
I'd suggest `searchString.ToUpper().Trim()` should be resolved outside of the query so that you don't offload that work onto sql server — spender, Oct 09 '12 at 12:04
@sharptooth That piece of string seems fairly long, but without a ruler or another piece of string, it could be incredibly short in comparison.. (how long is your piece of string? what are you measuring it against?) — Kieren Johnstone, Oct 09 '12 at 12:05
You're also potentially dispensing with any indexes you place on column `Name` by calling `ToUpper`. You'd be better using a db collation that isn't case sensitive. — spender, Oct 09 '12 at 12:06
I agree with Kieren. The effect of the sub-query will be hardly measurable, and almost non-existant with query plan caching. Any performance issues will arise from the `WHERE` clause - It will totally obfuscate any available indexes. If you were able to refactor your data to *not* require use of `UPPER` and `CONTAINS` you'd have a much more SQL friendly query. That may not be possible, but using either of these is always going to have **massive** performance consequences.) *[You're effectively doing `Name LIKE '%xxx%'` and anyone experienced with SQL will tell you **that** is the problem.]* — MatBailie, Oct 09 '12 at 12:08

score 8 · Accepted Answer · answered Oct 09 '12 at 12:04

It generates the resulting SQL by transforming an expression tree. It appears overengineered (for example, using a subquery) as a side-effect of the way it's done that transformation. The details of the transformation are proprietary and complex, and the results are not supposed to be human-readable.

The question is not entirely clear - and you are trying to solve a problem which I believe may not be a problem. Try comparing the generated query and your own - I would guess the query optimiser will make short work of such an easy optimisation.

My guess (and that's probably the best kind of answer you can get here unless a LINQ to Entities MS dev comes along) is that they're doing exactly that: generating the most effective query, but leaving the head-hurtingly-difficult job of optimising the query to the bit they've already put hundreds or thousands of man-days into: the query optimiser in SQL Server.

score 3 · Answer 2 · answered Oct 09 '12 at 12:52

It does an extra Select but Selects have no cost associated. You can see the estimated query plan and it would show 0% cost in that. It does that because EF is compatible with various RDBMS systems like Oracle, SQL server and to ensure maximum compatibility it might be doing this.

However I do agree that Entity Framework generates UGLY sql. The example that you gave was a very simple Linq query and you'll see more of that ugliness when your queries start becoming complex.

1) While this may or may not answer your answer, I would say use a micro ORM like PetaPoco:

https://github.com/toptensoftware/PetaPoco

or Dapper.Net

https://github.com/SamSaffron/dapper-dot-net

I have been using it in one of my project and I am completely satisfied with the raw speed that you get with plain Ado.Net.

2) My 2nd suggestion would be always use stored procedure for atleast Select statements. For inserts, updates and deletes you should probably use EF and take advantage of change tracking mechamism and that would save your time from writing tedious queries but atleast for Select statements you should try to use plain SQL and that gives you more freedom over what SQL is generated.

Good points, but mixing SQL and EF is asking for trouble in my opinion — Kieren Johnstone, Oct 09 '12 at 15:08
@KierenJohnstone that really depends on the intended use of the results of custom SQL queries. E.g. I can display a list of results from a stored procedure which gives me way more control than relying on EF to make the most efficient query. If I don't actually use the result of a stored proc to then make updates to save back to the DB via EF, I see no problem. — A. Murray, Nov 07 '14 at 11:25

Why does Entity Framework generate slow overengineered SQL?

2 Answers2

Linked