13

I have a paging API that returns rows a user requests, but only so many at one time, not the entire collection. The API works as designed, but I do have to calculate the total number of records that are available (for proper page calculations). Within the API, I use Linq2Sql and I work a lot with the IQueryable before i finally make my requests. When I go to get the count, I call something like: totalRecordCount = queryable.Count();

The resulting SQL is interesting none the less, but it also adds an unnecessary Order By which makes the query very expensive.

exec sp_executesql N'SELECT COUNT(*) AS [value]
FROM (
    SELECT TOP (1) NULL AS [EMPTY]
    FROM [dbo].[JournalEventsView] AS [t0]
    WHERE [t0].[DataOwnerID] = @p0
    ORDER BY [t0].[DataTimeStamp] DESC
    ) AS [t1]',N'@p0 int',@p0=1

Because I am using the IQueryable, I can manipulate the IQueryable prior to it making it to the SQL server.

My question is, if I already have an IQueryable with a OrderBy in it, is it possible to remove that OrderBy before I call the Count()?

like: totalRecordCount = queryable.NoOrder.Count();

If not, no biggie. I see many questions how to OrderBy, but not any involving removing an OrderBy from the Linq expression.

Thanks!

TravisWhidden
  • 2,142
  • 1
  • 19
  • 43

6 Answers6

10

So, the below code is a spike against an in-memory array. There may be some hurdles to get this working with Entity Framework (or some other arbitrary IQueryProvider implementation). Basically, what we are going to do is visit the expression tree and look for any Ordering method call and simply remove it from the tree. Hope this points you in the right direction.

class Program
{
    static void Main(string[] args)
    {
        var seq = new[] { 1, 3, 5, 7, 9, 2, 4, 6, 8 };

        var query = seq.OrderBy(x => x);

        Console.WriteLine("Print out in reverse order.");
        foreach (var item in query)
        {
            Console.WriteLine(item);
        }

        Console.WriteLine("Prints out in original order");
        var queryExpression = seq.AsQueryable().OrderBy(x => x).ThenByDescending(x => x).Expression;

        var queryDelegate = Expression.Lambda<Func<IEnumerable<int>>>(new OrderByRemover().Visit(queryExpression)).Compile();

        foreach (var item in queryDelegate())
        {
            Console.WriteLine(item);
        }


        Console.ReadLine();
    }
}

public class OrderByRemover : ExpressionVisitor
{
    protected override Expression VisitMethodCall(MethodCallExpression node)
    {
        if (node.Method.DeclaringType != typeof(Enumerable) && node.Method.DeclaringType != typeof(Queryable))
            return base.VisitMethodCall(node);

        if (node.Method.Name != "OrderBy" && node.Method.Name != "OrderByDescending" && node.Method.Name != "ThenBy" && node.Method.Name != "ThenByDescending")
            return base.VisitMethodCall(node);

        //eliminate the method call from the expression tree by returning the object of the call.
        return base.Visit(node.Arguments[0]);
    }
}
Craig Wilson
  • 12,174
  • 3
  • 41
  • 45
  • Just for the example, that definitely deserves an up-vote. Technically could create a nice extension method for Linq. I already did my re-factor, but this also could a possible answer to this. Technically this answers the question more then the work around we did above. Anyone else agree? I haven't tested it. – TravisWhidden May 15 '12 at 04:31
6

There isn't just an unneeded ORDER BY, there's also a spurious TOP(1).

SELECT TOP (1) NULL AS [EMPTY] ...

That subselect will only return 0 or 1 rows. In fact without the TOP there it wouldn't be legal to have an ORDER BY in a subselect.

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.: SELECT COUNT(*) FROM ( SELECT * FROM Table1 ORDER BY foo )

sqlfiddle

I think you have probably done something wrong in your LINQ. Are you sure you haven't written .Take(1) or similar somewhere in your query, before calling .Count()?

This is wrong:

IQueryable<Foo> foo = (...).OrderBy(x => x.Foo).Take(1);
int count = foo.Count();

You should do this instead:

IQueryable<Foo> foo = (...);
Iqueryable<Foo> topOne = foo.OrderBy(x => x.Foo).Take(1);
int count = foo.Count();
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Totally understand what you are saying.. I know its generating some strange sql. But really more looking for the answer to the question, not really about what the SQL looks like. But yes... it is quite fugly. – TravisWhidden May 14 '12 at 21:07
  • Mark said that this query is not counting anything. Are you sure this is the right query? Does it return a count greater 1? – usr May 14 '12 at 21:14
  • Just to clarify, Normally there isn't a Top(1), but in this example I requested just one record from our api and it still would call the .Count() even though the count could be 1 or 0. Usually there is a top 100000. the Count() appears to wrap the actual query results, but the order by is not really needed and expensive to the query. Removing the OrderBy before the count would reduce the SQL cost drastically while still being accurate in the total number of records to be returned. Hope this clears it up, because yes at first glance I would be like "wtf is that guy doing" – TravisWhidden May 14 '12 at 21:16
  • Usr yes -- still returns 1 as the recordset. Not arguing the linq2sql code that is generated is at all sexy. I just was looking for a solution to remove the OrderBy from the IQueryable before the .Count(). – TravisWhidden May 14 '12 at 21:19
  • 1
    @TravisWhidden: What are you trying to count? Are you trying to count the number of results you just got? Or the number of results you would have got if you had fetched all of them instead of just the first n? If you want the latter, use the code I posted in my answer. If you want the former, then since you already are fetching the results from the database the simplest (and best performing) is to count them in memory rather than sending a new query to the database. – Mark Byers May 14 '12 at 21:27
  • Exactly, the number of rows if I were to request them all at once, but in this call, I requested only one row to be returned with a max of 1 possible row. I had already started to refactor to just apply the sort after the Count() which is almost identical to what you suggested, but now this thread has me interested in finding out a possible work around without refactoring. – TravisWhidden May 14 '12 at 21:51
  • @MarkByers So this seems like an acceptable answer. Same idea I was thinking about doing but wanted to see if there was a more elegant way. So far its not looking like its possible with a simple call (like usr would have been). I will hold out for a little bit before granting the answer just in case someone has a better solution, but will definitely up vote. Thanks for the help. – TravisWhidden May 14 '12 at 22:12
3

I am afraid there is no easy way to remove the OrderBy operator from queryable.

What you can do, however, is to re-create the IQueryable based on the new expression obtained from rewriting queryable.Expression(see here) omitting the OrderBy call.

Krizz
  • 11,362
  • 1
  • 30
  • 43
2

If you can't eliminate the root cause, here is a workaround:

totalRecordCount = queryable.OrderBy(x => 0).Count();

SQL Server's query optimizer will remove this useless ordering. It won't have runtime cost.

usr
  • 168,620
  • 35
  • 240
  • 369
  • I gave this a whirl, but without luck. I was hoping it would remove the OrderBy and OrderByDescending expression for the IQueryable. count = queryResults.OrderBy(x => 0).OrderByDescending(x => 0).Count(); but the resulting still contained the Order By clause. By passing an OrderBy, it should replace the existing OrderBy right, not staggers them right? – TravisWhidden May 14 '12 at 21:48
  • 1
    I don't think that adding a new `OrderBy` clause will make the old one go away. `.OrderBy(x => x.Foo).OrderBy(x => x.Bar)` gives the same results as `.OrderBy(x => x.Bar).ThenBy(y => y.Foo)`. Besides, you don't have that, you have `.OrderBy(x => x.Foo).Take(1).OrderBy(x => 0)`. – Mark Byers May 14 '12 at 21:59
  • Yea, Linq2Sql did omit the x => 0 but kept the original orderby. I think adding additional OrderBy's to the expression must make it similar to ThenBy. As elegant/hacky as this solution sounded, it didn't work :( – TravisWhidden May 14 '12 at 22:02
  • This will indeed stagger order-bys but the query optimizer removes them. Look at the query plan to confirm this. Add 10 order-bys and you won't see 10 sorts. This general approach is sometimes a good way to work around a L2S limitation. Just rely on the optimizer to remove crap. – usr May 14 '12 at 22:03
  • I have a factory class that returns a IQueryable and within that factory method I was providing a default sort order (finalQuery = finalQuery.OrderByDescending(dataEvent => dataEvent.DataTimeStamp)). After calling the factory method and then adding the additional sorts, such as (x => 0), it still maintained that original OrderByDescending expression in the final sql output. I am verifying using SQL Server Profiler. I can tell it didn't remove them out of the query plan. The query was taking 12 seconds to execute before, and after I removed all OrderBy's, 300 ms. – TravisWhidden May 14 '12 at 22:18
  • +1 for "look at the query plan". TravisWhidden: http://msdn.microsoft.com/en-us/library/ms191194.aspx – Amy B May 15 '12 at 03:17
0

I think you have implemented you paging code wrongly. You actually need to query the database twice, once for the paged datasource and once for the total row count. This is how the setup should look.

public IList<MyObj> GetPagedData(string filter, string sort, int skip, int take)
{
   using(var db = new DataContext())
   {
      var q = GetDataInternal(db);
      if(!String.IsNullOrEmpty(filter))
         q = q.Where(filter); //Using Dynamic linq

      if(!String.IsNullOrEmpty(sort))
         q = q.OrderBy(sort); //And here

      return q.Skip(skip).Take(take).ToList();
   }
}

public int GetTotalCount(string filter)
{
    using(var db = new DataContext())
    {
       var q = GetDataInternal(db);
       if(!String.IsNullOrEmpty(filter))
         q = q.Where(filter); //Using Dynamic linq

       return q.Count(); //Without ordering and paging.
    }
}

private static IQuerable<MyObj> GetDataInternal(DataContext db)
{
   return 
        from x in db.JournalEventsView 
        where ...
        select new ...;
}

The filtering and sorting is done using the Dynamic linq library

Magnus
  • 45,362
  • 8
  • 80
  • 118
  • Thanks for the input. I do actually query it twice. The Count() is for the page count, but I use the same queryable for the result set. which I use later when I do a Skip, Take: var queryResultsList = orderdResults.Skip((result.ReturnValue.CurrentPage) * result.ReturnValue.RecordsPerPage).Take(result.ReturnValue.RecordsPerPage).ToList(); – TravisWhidden May 14 '12 at 22:04
  • from the generated sql code in your question it looks like you are doing `Count()` on the `IQuerable` object _after_ `OrderBy` and `skip/take` is applied – Magnus May 14 '12 at 22:08
  • Nah, the count was being executed after the orderby, but before the skip/take. That was the main reason I wanted to remove the orderby, but I just created two IQueryable's as suggested above, one with, one without the sort. – TravisWhidden May 14 '12 at 23:30
  • @TravisWhidden A linq statement that looks like this `tblUsers.OrderBy (u => u.fkCompanyID).Count()` linq will just strip the orderby from the generated SQL code: `SELECT COUNT(*) AS [value] FROM [dbo].[tblUser] AS [t0]` – Magnus May 15 '12 at 09:52
  • I was wrong in what I said yesterday. I forgot that I had a coded "Max" possible results that is allowed from the query. This is to say there can be "x" max records returned ore less, but never more. The Count happened after the (Take), so there would be correctly calculated pages. Thats why it resulted in such an ugly sql. – TravisWhidden May 15 '12 at 21:34
  • IE: There are 150,000 records that match the query, but I dont want anymore then 100k. it would return 100k as the final count, however if I had 75k records that matched, with a max of 100k, it would return the 75k. Your paging example doesnt reflect a "max" record count. It was something we did to prevent a possible memory problem with ridiculous queries that could happen via the API. – TravisWhidden May 15 '12 at 21:37
  • @TravisWhidden Since the paging is performed in the database I don't see why you would need a max and why there could be memory issues? – Magnus May 16 '12 at 09:04
  • As a safeguard for our server really. If the consumer of the API were to request page 1 with 100m records and we didn't have a coded max records, the server would have to download all those records from the sql server, then process them into the associated linq2sql object, then go through our process of turning them into API objects.... then gzipping the response... etc, you can see what I mean. – TravisWhidden May 17 '12 at 06:30
0

I know it is not quite what you are looking for, but index on [DataOwnerID] with inclusion of DataTimeStamp could make your query less expensive.

Val Bakhtin
  • 1,434
  • 9
  • 11