Linq to objects - boondoggle?

Question

I thought that LINQ to objects is great way for differed execution of joined data. In reality, it comes up as bad way to do things...

Here is what we had, having few, few hundred, 3K and 3.5K records in a,b,c,d correspondingly

IEnumerable<MyModel> data =
    (from a in AList
     from b in BList.Where(r => r.AId == a.Id)
     from c in CList.Where(r => r.BId == b.Id)
     from d in DList.Where(r => r.SomeId == myId && r.Some2Id == c.Some2Id)
 //  . . . . . .

Wasn't LINQ supposed to be great about doing it? In reality, following works much faster, 60 times faster in fact

var dTemp = DList.Where(r => r.SomeId == myId).ToList();
var cTemp = CList.Where(c => dTemp.Any(d => d.Some2Id == c.Some2Id)).ToList();

IEnumerable<MyModel> data =
    (from a in AList
     from b in BList.Where(r => r.AId == a.Id)
     from c in cTemp.Where(r => r.BId == b.Id)
 //  . . . . . .

And then I came across this article

Q: Is there a way to improve this query without abandoning single LINQ?

Or does this mean that LINQ to objects in form of joins need to be avoided and replaced by some sequential calls if performance is at stake?

It looks like you're trying to inner join `a`, `b`, and `c`. Have you tried using the [`Join`](https://learn.microsoft.com/en-gb/dotnet/api/system.linq.enumerable.join?view=netframework-4.7) function instead of the `where`'s? — Reddog, Sep 22 '17 at 00:25
Consider pessimizing your LINQ less aggressively. "I'm infallible so the framework must be broken" is usually not the best place to start. — 15ee8f99-57ff-4f92-890c-b56153, Sep 22 '17 at 00:36
@EdPlunkett Please. I never say it is broken. But looks like, since optimizer is required, it is not quite does the job — T.S., Sep 22 '17 at 02:19
@Reddog Yes, now we are optimizing to that. That was just bad practice. In some instances it works the same but `join` is faster — T.S., Sep 22 '17 at 13:46

score 3 · Accepted Answer · answered Sep 22 '17 at 00:37

Let's analyze the differences.

First query: you are performing a filter on BList, a filter on CListand two filters on DList, all in a deferred-execution manner. You then use a kind of a join.
Second query: you perform a static filter on DList and evaluate it, another static filter on CList based on DList and evaluate it and then a deferred-executed filter on both AList and BList.

The second query is faster because:

DList is not being looked at for useless values (due to previous filters)
CList only contains useful values due to previous filters

Anyway, both queries are wrong. Multiple from is basically a cross-join, as explained here. As @Reddog commented, the best way is to actually use Join:

var data = from a in AList
           join b in BList on a.Id equals b.AId
           join c in CList on b.Id equals c.BId
           join d in DList on c.Some2Id equals d.Some2Id
           where d.SomeId == someId;

You're right. Under some conditions `from-from` works same as `from-join`. And with small data sets we never had an issue. But this one client had a unique setup, in which they had large [huge] sets in `c` and `d` and it wasn't doing too well. Even so, it wasn't that bad -140 ms but under that setup it was repeatedly ran. So 140x20... Now it is 2 ms. 2x20 - very different result. We would tune it before but we bench-marked against SQL calls that cached data replaced and it was doing good. So, nobody paid attention to a little LINQ fact. Thank you — T.S., Sep 22 '17 at 13:43

Linq to objects - boondoggle?

1 Answers1