1

Suppose I have a List<T> with 1000 items in it.

I'm then passing this to a method that filters this List. As it drops through the various cases (for example there could be 50), List<T> may have up to 50 various Linq Where() operations performed on it.

I'm interested in this running as quickly as possible. Therefore, I don't want this List<T> filtered each time a Where() is performed on it.

Essentially I need it to defer the actual manipulation of the List<T> until all the filters have been applied.

Is this done natively by the compiler? Or just when I call .ToList() on the IEnumerable that a List<T>.Where() returns, or should I perform the Where() operations on X (Where X = List.AsQueryable())?

Hope this makes sense.

maxp
  • 24,209
  • 39
  • 123
  • 201

3 Answers3

4

Yes, deferred execution is supported natively. Every time you apply a query or lambda expression on your List the query stores all the expressions that is executed only when you call .ToList() on the Query.

this. __curious_geek
  • 42,787
  • 22
  • 113
  • 137
  • Do you know what sort of speed penalty there is on calling List.AsQueryable().ToList() (i.e. a list with no expressions called on it)? – maxp Jun 17 '10 at 10:40
  • Edit: 32767 x List.AsQueryable().ToList() took 120ms so basically negligible. – maxp Jun 17 '10 at 10:59
3

Each call to Where will create a new object which knows about your filter and the sequence it's being called on.

When this new object is asked for a value (and I'm being deliberately fuzzy between an iterator and an iterable here) it will ask the original sequence for the next value, check the filter, and either return the value or iterate back, asking the original sequence for the next value etc.

So if you call Where 50 times (as in list.Where(...).Where(...).Where(...), you end up with something which needs to go up and down the call stack at least 50 times for each item returned. How much performance impact will that have? I don't know: you should measure it.

One possible alternative is to build an expression tree and then compile it down into a delegate at the end, and then call Where. This would certainly be a bit more effort, but it could end up being more efficient. Effectively, it would let you change this:

list.Where(x => x.SomeValue == 1)
    .Where(x => x.SomethingElse != null)
    .Where(x => x.FinalCondition)
    .ToList()

into

list.Where(x => x.SomeValue == 1 && x.SomethingElse != null && x.FinalCondition)
    .ToList()

If you know that you're just going to be combining a lot of "where" filters together, this may end up being more efficient than going via IQueryable<T>. As ever, check performance of the simplest possible solution before doing something more complicated though.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • The previous code attacked this problem by using the Linq Dynamic library (http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx), which just built a long 'where' string. The idea is nice, but without timing it, im still confident it is slower than just shotgunning where's together as it has to parse the string bleh. – maxp Jun 17 '10 at 11:04
  • @maxp: Doing it dynamically would probably be slower, yes - but you don't have to do it dynamically. You can use expression trees to do all of this safely, and build it into "native" IL using LambdaExpression.Compile. – Jon Skeet Jun 17 '10 at 11:07
  • I wanted to comment on your answer by telling about the performance optimizations for arrays and List inside of the `Enumerable.Where` extension methods. However, I noticed that these optimizations have no impact on the correctness of your answer. While these optimizations prevent 50 iterators to be wrapped, they won't prevent 50 delegates to be wrapped and thus won't prevent a 50 calls deep call stack. Best for performance would indeed be one single delegate. – Steven Jun 17 '10 at 11:15
0

There is so much failure in the question and the comments. The answers are good but don't hit hard enough to break through the failure.

Suppose you have a list and a query.

List<T> source = new List<T>(){  /*10 items*/ };
IEnumerable<T> query = source.Where(filter1);
query = query.Where(filter2);
query = query.Where(filter3);
...
query = query.Where(filter10);

Is [lazy evaluation] done natively by the compiler?

No. Lazy evaluation is due to the implementation of Enumerable.Where

This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.


speed penalty there is on calling List.AsQueryable().ToList()

Don't call AsQueryable, you only need to use Enumerable.Where.


thus won't prevent a 50 calls deep call stack

Depth of call stack is much much less important than having a highly effective filter first. If you can reduce the number of elements early, you reduce the number of method calls later.

Amy B
  • 108,202
  • 21
  • 135
  • 185
  • For the record - failure = not understanding and not looking up the methods being used. – Amy B Jun 17 '10 at 14:25