2

In the past, I've written Linq to SQL queries that haven't performed well. Using SQL Profiler (or similar) I can look at how my query is translated to SQL by intercepting it at the database.

Is there a way to do this with Linq queries that operate solely on objects?

As an example, consider the following Linq query on a list of edges in a directed graph:

var outEdges = from e in Edges
               where e.StartNode.Equals(currentNode) &&
               !(from d in deadEdges select d.StartNode).Contains(e.EndNode)
               select e;

That code is supposed to select all edges that start from the current node except for those that can lead to a dead edge.

Now, I have a suspicion that this code is inefficient, but I don't know how to prove it apart from analysing the MSIL that's generated. I'd prefer not to do that.

Does anyone know how I could do this without SQL?

Edit:

When I talk about inefficiency, I mean inefficiency in terms of "Big O" notation or asymptotic notation. In the example above, is the code executing the Linq in O(n) or O(n log m) or even O(n.m)? In other words, what's the complexity of the execution path?

With Linq to SQL, I might see (for example) that the second where clause is being translated as a subquery that runs for each edge rather than a more efficient join. I might decide not to use Linq in that case or at least change the Linq so it's more efficient with large data sets.

Edit 2:

Found this post - don't know how I missed it in the first place. Just searching for the wrong thing I guess :)

Community
  • 1
  • 1
Damovisa
  • 19,213
  • 14
  • 66
  • 88
  • Inefficient in comparison to what? At the bottom line you have two options that I can see, throw a timer around the execution of this query and the alternative and see which is faster or look at the IL for each and make a determination based on your understanding of the apparent execution speed of each IL instruction. At the end of the day, if you write a poor query you'll get poor performance whether in SQL or Linq. – Lazarus Oct 27 '09 at 14:35
  • Perhaps I didn't give a good example, but I meant inefficient in comparison to how it _could_ be written. I'm talking about "Big O" notation or asymptotic notation. Testing by timing is difficult unless you give it an inefficient object tree, and frankly IL is difficult to read. I'll clarify in the question. – Damovisa Oct 27 '09 at 23:01

1 Answers1

1

I don't think you need a profiler for that...

Linq to SQL (or Linq to Entities) queries are translated to another language (SQL) and then executed using an optimized execution plan, so it's hard to see how exactly what happens ; for this kind of scenario, a profiler can be helpful. On the other hand, Linq to Objects queries are not translated, they are executed "as is". A Linq to Objects query using the SQL-like syntax is just syntactic sugar for a series of method calls. In your case, the full form of the query would be :

var outEdges = Edges.Where(e => e.StartNode.Equals(currentNode) &&
                           !deadEdges.Select(d => d.StartNode).Contains(e.EndNode));

So, basically, you iterate over Edges, and for each item in Edges you iterate over deadEdges. So the complexity here is O(n.m), where n is the number of items in Edges, and m the number of items in deadEdges

Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • Thanks for that, I think I understand it a bit better. So what that means that if there's a shortcut (like prepopulating a hashlist of start nodes from deadEdges) I'll probably have to do it manually? – Damovisa Oct 28 '09 at 23:15
  • No problem - I meant if there's a more efficient way to do it, I'd have to write it manually rather than using Linq – Damovisa Nov 20 '09 at 06:52