57

I came across an issue with EF creating terrible queries when searching on a string field. Its produced a query in the style of lazy programmers to encompass null checking which forces the whole index to be scanned.

consider the following queries.

  1. Query 1

    var p1 = "x";
    var r1 = ctx.Set<E>().FirstOrDefault(
                            subject =>
                                p1.Equals(subject.StringField));
    
  2. Query 2

    const string p2 = "x";
    var r2 = ctx.Set<E>().FirstOrDefault(
                            subject =>
                                p2.Equals(subject.StringField));
    

Query 1 produces

WHERE (('x' = [Extent2].[StringField]) OR (('x' IS NULL) AND ([Extent2].[StringField] IS NULL))) 

and executes in 4 seconds

Query 2 produces

WHERE (N'x' = [Extent2].[StringField]) 

and executes in 2 milliseconds

Does anyone know of any work arounds? (no the parameter cant be a const as it is entered by user input but cannot be null.)

N.B When profiled, both queries are prepared with sp_executesql by EF; as of-cause if they were just executed the query optimiser would negate the OR 'x' IS NULL check.

for @Martin

Mark
  • 1,544
  • 1
  • 14
  • 26
  • Did you try `subject.StringField == p1`? – Yacoub Massad Jul 18 '16 at 09:51
  • 1
    Yes. I tried every variation. The reason I wrote p1.Equals was to try and trick EF that p1 cannot be null or a null reference exception would be thrown – Mark Jul 18 '16 at 09:53
  • 1
    This was an answer by Paul white..http://stackoverflow.com/questions/17323547/should-the-order-of-linq-query-clauses-affect-entity-framework-performance/17873031#17873031, may be helpfull i am not sure – TheGameiswar Jul 18 '16 at 10:00
  • Query 1 isn't normally a pattern that causes problems for SQL Server. It usually gets treated the same as the intersect version here http://sqlblog.com/blogs/paul_white/archive/2011/06/22/undocumented-query-plans-equality-comparisons.aspx – Martin Smith Jul 18 '16 at 10:19
  • related question http://stackoverflow.com/questions/34472252/entity-framework-database-query-very-slow – Mark Jul 18 '16 at 10:30
  • @MartinSmith its not but the query isn't executed exactly as i have posted it. that is output from our query logger the actual query 'x' is p_linq__1 – Mark Jul 18 '16 at 10:32
  • @Mark - It'd be interesting to see the actual query. e.g here we get a seek. http://i.stack.imgur.com/eEO8o.png – Martin Smith Jul 18 '16 at 10:33
  • @MartinSmith updated with plan etc. Had to blank some things out... politics and all – Mark Jul 18 '16 at 12:20
  • @Mark - That plan doesn't look catastrophic by any means. It is full of seeks and the lines are very thin. Maybe you have a statistics issue (possibly parameter sniffing) where it estimates 1 row but actual is much higher so the number of lookups is much larger than it anticipates? – Martin Smith Jul 18 '16 at 13:28
  • can't you add a AND subject is NOT NULL check to your call ? – Falco Jul 18 '16 at 15:02
  • @MartinSmith thanks for the suggestion I've just updated the stats just encase and the estimate is now 1.18 not 1.14 so its actually worse haha. – Mark Jul 18 '16 at 15:34
  • @falco good suggestion I didn't think of that. Unfortunately I have just tried that and it just gets ignored. – Mark Jul 18 '16 at 15:37

3 Answers3

45

Set UseDatabaseNullSemantics = true;

  • When UseDatabaseNullSemantics == true, (operand1 == operand2) will be translated as:

    WHERE operand1 = operand2
    
  • When UseDatabaseNullSemantics == false, (operand1 == operand2) will be translated as:

    WHERE
        (
            (operand1 = operand2)
            AND
            (NOT (operand1 IS NULL OR operand2 IS NULL))
        )
        OR
        (
            (operand1 IS NULL)
            AND
            (operand2 IS NULL)
        )
    

This is documented by Microsoft:

Gets or sets a value indicating whether database null semantics are exhibited when comparing two operands, both of which are potentially nullable. The default value is false.

You can set it in your DbContext subclass constructor, like so:

public class MyContext : DbContext
{
    public MyContext()
    {
        this.Configuration.UseDatabaseNullSemantics = true;
    }
}

Or you can also set this setting to your dbContext instance from the outside like the code example below, from my point of view (see @GertArnold comment), this apporach will be better, because it will not change the default database behaviour or configuration):

myDbContext.Configuration.UseDatabaseNullSemantics = true;
Dai
  • 141,631
  • 28
  • 261
  • 374
Bassam Alugili
  • 16,345
  • 7
  • 52
  • 70
  • 1
    But this also shows that the null checks aren't "unnesessary". They prevent unexpected behavior, when `var p1` is `null`. That's why `UseDatabaseNullSemantics` is false by defautl, so you get (the expected) .Net null semantics. – Gert Arnold Jul 18 '16 at 10:24
  • @GertArnold yes he can put it from outside, for those spcial use cases! I prefer it also like that myDbContext.Configuration.UseDatabaseNullSemantics = true; and leaving the default configuration for the rest i will update the answer – Bassam Alugili Jul 18 '16 at 10:28
  • @GertArnold In a general case, the context flag determines whether SQL or C# null-comparison rules are used. Since the OP's _operand1_ is the parameter and it cannot be null, those null comparisons are unnecessary; SQL null-comparisons work equivalent in this case. That's what is expected. – Suncat2000 Aug 14 '17 at 21:16
  • What about ef core? – jjxtra Dec 01 '19 at 16:22
9

You can fix this by adding [Required] on StringField property

public class Test
{
    [Key]
    public int Id { get; set; }
    [Required]
    public string Bar{ get; set; }
    public string Foo { get; set; }

}


 string p1 = "x";
 var query1 = new Context().Tests.Where(F => p1.Equals(F.Bar));

 var query2 = new Context().Tests.Where(F => p1.Equals(F.Foo));

this is query1

{SELECT [Extent1].[Id] AS [Id], [Extent1].[Bar] AS [Bar], [Extent1].[Foo] AS [Foo] FROM [dbo].[Tests] AS [Extent1] WHERE @p__linq__0 = [Extent1].[Bar]}

and this is query2

{SELECT [Extent1].[Id] AS [Id], [Extent1].[Bar] AS [Bar], [Extent1].[Foo] AS [Foo] FROM [dbo].[Tests] AS [Extent1] WHERE (@p__linq__0 = [Extent1].[Foo]) OR ((@p__linq__0 IS NULL) AND ([Extent1].[Bar2] IS NULL))}

Kahbazi
  • 14,331
  • 3
  • 45
  • 76
  • The field can be null however the query will never search for nulls. – Mark Jul 18 '16 at 10:14
  • also I believe ef should produce two possible queries for a string based on the parameter being null or not. Not a lazy one where it leaves it up-to SQL Server reminds me of dev's of old writing similar things in SP's because they couldn't be bothered to write two SP's – Mark Jul 18 '16 at 10:16
  • @Mark but if you have a query with 20 parameters, each is nullable and the user can search for any combination, the query-builder would create an individual query for each combination of null/not null parameters - to a maximum of 2^20 queries in total. This can kill your statement-cache and drop performance. This was one of the problems before prepared statements, when people wrote all queries by string concatenation... – Falco Jul 18 '16 at 15:01
  • True if the cache was cycled quite frequently. But would it drop performance by more than 2000x? Because that's what i'm seeing by getting it to output the shared queries. Also IMO and experience its highly unlikely that a user would really search a single table by 20 nullable fields. I'd argue the database isn't properly in TNF. – Mark Jul 18 '16 at 15:57
  • This looks like code-first and a different "using" namespace. Otherwise, `Required` is an ASP.Net display attribute and would have no bearing on generated SQL. But if code-first is an option, this is a good suggestion. – Suncat2000 Aug 14 '17 at 21:19
3

A colleague of mine has just found a really really nice solution. Since I already discovered that using constants produces the correct SQL. We wondered if we could swap out the variables in the expression with constants; and as it turns out you can. I believe this method to be less invasive than changing the null settings on the DB context.

public class Foo_test : EntityContextIntegrationSpec
        {

            private static string _foo = null;

            private static DataConnection _result;

            private Because _of = () => _result = EntityContext.Set<E>().Where(StringMatch<E>(x => x.StringField));

            private static Expression<Func<TSource, bool>> StringMatch<TSource>(Expression<Func<TSource, string>> prop)
            {
                var body = Expression.Equal(prop.Body, Expression.Constant(_foo));
                return Expression.Lambda<Func<TSource,bool>>(body, prop.Parameters[0]);                
            }

            [Test] public void Test() => _result.ShouldNotBeNull();
        }
Mark
  • 1,544
  • 1
  • 14
  • 26
  • While this does generate better SQL, the actual execution time is much worse. I believe this is for two reasons. 1-overhead of building the expression. 2-This causes the query plan to not be cachable, which actually adds more db overhead to every call more than the extra work to do the null check – Jason Coyne Mar 20 '19 at 17:47
  • In our use case this was not true. The execution time was orders of magnitude faster than the null check. Please see the stats stated in the question. – Mark Mar 21 '19 at 18:27
  • 1
    In the generated SQL, using this code results in using a constant value in the SQL, instead of a parameterized query. Unfortunately this means that changing the value results in a new constant in the SQL, which means the query must be compiled and planned again. For smaller queries, that are run frequently with different values, it seems this quickly outweighs the cost of the additional null checks. – Jason Coyne Mar 26 '19 at 13:57