2

I've got an IQueryable Extension method that is being used to reduce the amount of boiler plate code needed to search a number of fields in an EF Core DbContext model:

public static IQueryable<TEntity> WherePropertyIsLikeIfStringIsNotEmpty<TEntity>(this IQueryable<TEntity> query,
    string searchValue, Expression<Func<TEntity, string>> propertySelectorExpression)
{
    if (string.IsNullOrEmpty(searchValue) || !(propertySelectorExpression.Body is MemberExpression memberExpression))
    {
        return query;
    }
    
    // get method info for EF.Functions.Like
    var likeMethod = typeof(DbFunctionsExtensions).GetMethod(nameof(DbFunctionsExtensions.Like), new []
    {
        typeof(DbFunctions),
        typeof(string),
        typeof(string)
    });
    var searchValueConstant = Expression.Constant($"%{searchValue}%");
    var dbFunctionsConstant = Expression.Constant(EF.Functions);
    var propertyInfo = typeof(TEntity).GetProperty(memberExpression.Member.Name);
    var parameterExpression = Expression.Parameter(typeof(TEntity));
    var propertyExpression = Expression.Property(parameterExpression, propertyInfo);
    
    
    var callLikeExpression = Expression.Call(likeMethod, dbFunctionsConstant, propertyExpression, searchValueConstant);
    var lambda = Expression.Lambda<Func<TEntity, bool>>(callLikeExpression, parameterExpression);
    return query.Where(lambda);
}

The code is working and producing expected results, however I was worried that I would get a performance hit for using Expressions and a bit of reflection. So I set up a benchmark using an in memory database and the BenchmarkDotNet nuget package. Here is the benchmark:

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.EntityFrameworkCore;

class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<Benchmark>();
        }
    }

    public class Benchmark
    {
        private Context _context;
        private string SearchValue1 = "BCD";
        private string SearchValue2 = "FG";
        private string SearchValue3 = "IJ";
        
        [GlobalSetup]
        public void Setup()
        {
            _context = new Context(new DbContextOptionsBuilder<Context>().UseInMemoryDatabase(Guid.NewGuid().ToString())
                .Options);

            _context.TestModels.Add(new TestModel(1, "ABCD", "EFGH", "HIJK"));
            _context.SaveChanges();
        }

        [GlobalCleanup]
        public void Cleanup()
        {
            _context.Dispose();
        }
        
        [Benchmark]
        public void FilterUsingExtension()
        {
            var _ = _context.TestModels
                .WherePropertyIsLikeIfStringIsNotEmpty(SearchValue1, testModel => testModel.Value)
                .WherePropertyIsLikeIfStringIsNotEmpty(SearchValue2, testModel => testModel.OtherValue)
                .WherePropertyIsLikeIfStringIsNotEmpty(SearchValue3, testModel => testModel.ThirdValue)
                .ToList();
        }

        [Benchmark]
        public void FilterTraditionally()
        {
            var query = _context.TestModels.AsQueryable();
            if (!string.IsNullOrEmpty(SearchValue1))
            {
                query = query.Where(x => EF.Functions.Like(x.Value, $"%{SearchValue1}%"));
            }
            if (!string.IsNullOrEmpty(SearchValue2))
            {
                query = query.Where(x => EF.Functions.Like(x.OtherValue, $"%{SearchValue2}%"));
            }
            if (!string.IsNullOrEmpty(SearchValue3))
            {
                query = query.Where(x => EF.Functions.Like(x.ThirdValue, $"%{SearchValue3}%"));
            }
        
            var _ = query.ToList();
        }
    }

    public class TestModel
    {
        public int Id { get; }
        public string Value { get; }
        public string OtherValue { get; }
        public string ThirdValue { get; }

        public TestModel(int id, string value, string otherValue, string thirdValue)
        {
            Id = id;
            Value = value;
            OtherValue = otherValue;
            ThirdValue = thirdValue;
        }
    }
    
    public class Context : DbContext
    {

        public Context(DbContextOptions<Context> options)
            : base(options)
        {
            
        }
        
        // ReSharper disable once UnusedAutoPropertyAccessor.Global
        public DbSet<TestModel> TestModels { get; set; }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);

            modelBuilder.Entity<TestModel>().ToTable("test_class", "test");
            modelBuilder.Entity<TestModel>().Property(x => x.Id).HasColumnName("id").HasColumnType("int");
            modelBuilder.Entity<TestModel>().Property(x => x.Value).HasColumnName("value").HasColumnType("varchar")
                .ValueGeneratedNever();
            modelBuilder.Entity<TestModel>().Property(x => x.OtherValue).HasColumnName("other_value").HasColumnType("varchar")
                .ValueGeneratedNever();
            modelBuilder.Entity<TestModel>().Property(x => x.ThirdValue).HasColumnName("third_value").HasColumnType("varchar")
                .ValueGeneratedNever();
            modelBuilder.Entity<TestModel>().HasKey(x => x.Id);
        }
    }

Like I said, I was expecting performance penalties for using reflection. but the benchmark shows that the query being built by my extension method is more than 10 times faster than just writing the expression directly in the Where method:

|               Method |        Mean |     Error |    StdDev |      Median |
|--------------------- |------------:|----------:|----------:|------------:|
| FilterUsingExtension |    73.73 us |  1.381 us |  3.310 us |    72.36 us |
|  FilterTraditionally | 1,036.60 us | 20.494 us | 22.779 us | 1,032.69 us |

Can anyone give an explanation for this?

MrWalrus
  • 33
  • 5
  • What SQL is yours generating? What SQL is the other generating? – mjwills Jul 16 '21 at 03:45
  • An In-Memory database is always going to benchmark faster. Reflection has little to do with this. Are you asking why an In-Memory database performs faster in a benchmark than a traditional SQL-backed DBContext? – Matthew M. Jul 16 '21 at 05:26
  • No idea, it should be a little bit slower. Check, maybe JIT optimised out whole test body? – Svyatoslav Danyliv Jul 16 '21 at 08:51

1 Answers1

6

Shortly, the difference is coming from the different expressions for pattern parameter of EF.Functions.Like, and the way LINQ to Objects (used by EF Core InMemory provider) processes IQueryable expression tree.

First off, performance test with EF Core InMemory provider against small set of data is irrelevant, since it is measuring basically the query expression tree building, while in the case of real database, most of the time is executing the generated SQL query, returning and materializing the result data set.

Second, regarding

I was worried that I would get a performance hit for using Expressions and a bit of reflection

Both approaches build query expression tree at runtime using Expression class methods. The only difference is that the C# compiler generates that code for you at compile time, thus has no reflection calls. But your code could easily be modified to avoid reflection as well, thus making the generation fully equivalent.

More important difference is that your code is emitting ConstantExpression, while currently C# compiler has no way to generate constant expressions from variables, so it always emits closures, which in turn are bound as query parameters by the EF Core query translator. Which in general is recommended for SQL queries, so you'd better do the same inside your method, or have an option to do so.

So, to recap shortly, your method binds constant expression, and compiler method binds closure. But not only. See here

query.Where(x => EF.Functions.Like(x.Value, $"%{SearchValue1}%"))

SearchValue1 variable is converted to closure, but since $"%{SearchValue1}%" is part of expression, it's not evaluated at that point, but is recorded as MethodCallExpression to string.Format.

These two gives the big performance difference in LINQ to Objects, since it executes query expression trees by first compiling expressions to delegate(s), and then running it. So at the end your code passes constant value, and compiler generated query code calls string.Format. And there is a big difference in compilation/execution time between the two. Multiplied by 3 in your test.


With all that being said, let see it in action.

First, the optimized extension method with one-time static reflection info caching and option for using constant or variable:

public static IQueryable<TEntity> WhereIsLikeIfStringIsNotEmpty<TEntity>(
    this IQueryable<TEntity> query,
    string searchValue,
    Expression<Func<TEntity, string>> selector,
    bool useVariable = false)
{
    if (string.IsNullOrEmpty(searchValue)) return query;
    var parameter = selector.Parameters[0];
    var pattern = Value($"%{searchValue}%", useVariable);
    var body = Expression.Call(LikeMethod, DbFunctionsArg, selector.Body, pattern);
    var predicate = Expression.Lambda<Func<TEntity, bool>>(body, parameter);
    return query.Where(predicate);
}

static Expression Value(string value, bool variable)
{
    if (!variable) return Expression.Constant(value);
    return Expression.Property(
        Expression.Constant(new StringVar { Value = value }),
        StringVar.ValueProperty);
}

class StringVar
{
    public string Value { get; set; }
    public static PropertyInfo ValueProperty { get; } = typeof(StringVar).GetProperty(nameof(Value));
}

static Expression DbFunctionsArg { get; } = Expression.Constant(EF.Functions);

static MethodInfo LikeMethod { get; } = typeof(DbFunctionsExtensions).GetMethod(nameof(DbFunctionsExtensions.Like), new[]
{
    typeof(DbFunctions),
    typeof(string),
    typeof(string)
});

Note that I removed the Property from the method name and the requirement for MemberExpression, since it is not needed - the method will work with any string returning expression.

Second, add two new benchmark methods for it:


[Benchmark]
public void FilterUsingExtensionOptimizedUsingConstant()
{
    var _ = _context.TestModels
        .WhereIsLikeIfStringIsNotEmpty(SearchValue1, testModel => testModel.Value, false)
        .WhereIsLikeIfStringIsNotEmpty(SearchValue2, testModel => testModel.OtherValue, false)
        .WhereIsLikeIfStringIsNotEmpty(SearchValue3, testModel => testModel.ThirdValue, false)
        .ToList();
}

[Benchmark]
public void FilterUsingExtensionOptimizedUsingVariable()
{
    var _ = _context.TestModels
        .WhereIsLikeIfStringIsNotEmpty(SearchValue1, testModel => testModel.Value, true)
        .WhereIsLikeIfStringIsNotEmpty(SearchValue2, testModel => testModel.OtherValue, true)
        .WhereIsLikeIfStringIsNotEmpty(SearchValue3, testModel => testModel.ThirdValue, true)
        .ToList();
}

Finally, add benchmark for optimized version of "traditional way" which avoids string.Format in the expression tree (but still binds variable):

[Benchmark]
public void FilterTraditionallyOptimized()
{
    var query = _context.TestModels.AsQueryable();
    if (!string.IsNullOrEmpty(SearchValue1))
    {
        var pattern = $"%{SearchValue1}%";
        query = query.Where(x => EF.Functions.Like(x.Value, pattern));
    }
    if (!string.IsNullOrEmpty(SearchValue2))
    {
        var pattern = $"%{SearchValue2}%";
        query = query.Where(x => EF.Functions.Like(x.OtherValue, pattern));
    }
    if (!string.IsNullOrEmpty(SearchValue3))
    {
        var pattern = $"%{SearchValue3}%";
        query = query.Where(x => EF.Functions.Like(x.ThirdValue, pattern));
    }

    var _ = query.ToList();
}

The results:

Method Mean Error StdDev
FilterUsingExtension 51.84 us 0.089 us 0.079 us
FilterUsingExtensionOptimizedUsingConstant 48.95 us 0.061 us 0.054 us
FilterUsingExtensionOptimizedUsingVariable 58.40 us 0.354 us 0.331 us
FilterTraditionally 625.40 us 1.269 us 1.187 us
FilterTraditionallyOptimized 60.09 us 0.491 us 0.435 us

As we can see, optimized extension method using constant is fastest, but very close to your original (which means the reflection is not essential).

The variant with variable is bit slower, but generally will be better when used against real database.

The optimized "traditional" method is bit slower than the previous two, which is kind of surprising, but the difference is negligible.

The original "traditional" method is a way slower than all previous, due to the aforementioned reasons. But against real database it would be negligible part of the overall query execution.

Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • 1
    Ah, missed string interpolation. Good catch. – Svyatoslav Danyliv Jul 16 '21 at 10:00
  • 1
    Oh, that makes a lot of sense. I completely missed the string interpolation. Thanks so much for the detailed answer – MrWalrus Jul 17 '21 at 04:10
  • 1
    Hi Ivan, Can you make me a favor please, when you have some time? I am asking you because you are the best EF expert of SO. I am answering the question https://stackoverflow.com/questions/68471904/ef-core-relationship-mapping-between-itself/68472971?noredirect=1#comment121058121_68472971 and wondering could you find some time to help me please? I would like to know if I can add to Member the property ICollection Recruits. Is it possible? Thanks in advance. PS. I am not creating the post since it is not my question. – Serge Jul 23 '21 at 14:49
  • Hi @Serge, Sorry for the late response, I'm quite busy these days. Sure I'll take a look at it. – Ivan Stoev Jul 24 '21 at 04:41