Is it possible to do data paging in SQL Server (2012) in constant time?

Question

I've been researching a solution to a performance problem with a system I'm responsible for, and I think at least part of the problem is due to database query performance. We use stored procedures to query "pages" of data in a pretty standard way. However, this paging appears to be more costly when the datasets get large.

Given this simple table populated with sample data:

create table Data (
Value uniqueidentifier not null,
constraint PK_Data primary key clustered (Value)
)

insert into Data 
-- SeedTable has ~2M rows
select newid() from SeedTable

And this stored procedure to return paged data: (this requires Sql2012 apparently, though the Sql2008 style of using ROW_NUMBER() behaves the same):

create proc
GetDataPage @Offset int, @Count int
as

select Value
from Data
order by Value
offset @Offset rows
fetch next @Count rows only

I then test the performance of this sproc with this C# code:

const int PageSize = 50;
const int MaxCount = 50000;

using (var conn = new SqlConnection("Data Source=.;Initial Catalog=TestDB;Integrated Security=true;")) {

  conn.Open();
  int a = 0;
  for (int i = 0; ; i += PageSize) {
    using (var cmd = conn.CreateCommand()) {
      cmd.CommandType = System.Data.CommandType.StoredProcedure;
      cmd.CommandText = "GetDataPage";
      var oid = cmd.CreateParameter();

      var offset = cmd.CreateParameter();
      offset.Value = i;
      offset.ParameterName = "Offset";
      cmd.Parameters.Add(offset);

      var count = cmd.CreateParameter();
      count.Value = PageSize;
      count.ParameterName = "Count";
      cmd.Parameters.Add(count);

      var sw = Stopwatch.StartNew();
      int c = 0;
      using(var reader = cmd.ExecuteReader()) {
        while (reader.Read()) {
          c++;
        }              
      }
      a += c;

      sw.Stop();
      Console.WriteLine(sw.ElapsedTicks + "\t" + a);

      if (c < PageSize || a >= MaxCount)
        break;
    }
  }
}

When I chart the output of this code I get the following: Linear

I would have expected that paging like this in SQL would have constant time performance, or perhaps logarithmic at worst, but it is pretty clear from the chart that performance is linear.

Are there any special tricks (hints) to make this work better?

Is there another approach to this that might be faster?

Do other databases behave the same way?

Changing the experimental code to use the "page from" technique, that Kevin Suchlicki suggests, results in the following:

Page From ID

Very impressive. This performance looks more like what I would expect/want. Now I just need to figure out if I can apply this to my real problem. The potential issue being that it doesn't allow for "random access" of the data, but a forward only cursor-like access. I'm aware that it must look like what I'm doing violates every notion of good database design.

don't cluster on a uniqueidentifier datatype, if you have to use newsequantialid() not newid() to populate the table. newid() will fragment the table — SQLMenace, Jun 20 '13 at 20:08

Kevin Suchlicki · Accepted Answer · 2013-06-20T21:39:40.620

The most obvious possibility is in the app design itself. Offer your users filter criteria. Users usually have some idea what they are looking for and would rather not page thru 1000 pages of returned results. How often do you pass page 10 on a google search?

Having said that, you could try storing the id (clustered index value) of the last row returned on the previous page and use that in your SQL where clause. If you need to allow sorting on different keys (e.g. last name), then store the clustered index id value and the final last name of the previous page. Then write your SQL like this (you always need to order on your key field and clustered id value in order to deterministically order the records in the case of duplicate key values):

select top (@count) Id, LastName, FirstName
from Data
where LastName >= @previousLastName and Id > @previousId
order by LastName, Id

You would also want to index all the fields that could be sort keys. Not sure how the above would perform but I would expect the search on indexed fields would perform O(log n).

Another option might be to persist the full list, in order, with row value, every time the source data changes, behind the scenes, and have the app pull from the persisted table.

Good question... Let us know how it turns out please!

Looks no worse than O(log n), see added notes above. Good tip. More research needed though. Thanks. — MarkPflug, Jun 20 '13 at 23:25
Hi Mark, good research. you could use the same technique to move backwards thru the data as well. But I guess for random access of a given page, you will need your original code. Could you do some hybrid, using your code to go to a particular page, and mine for paging forward and back from there? — Kevin Suchlicki, Jun 21 '13 at 09:09
But if you really want to go hard-core, try the other technique I mentioned, of persisting the ordered results to a table with the row numbers. this could work great if the underlying data changes infrequently. use a trigger on the table to add a "rebuild" message to a service broker queue which could then asynchronously rebuild the paged data into a backup table and then perform a table swap with your existing paged data. not an easy solution but rock-solid effectiveness I expect (again, only if your data changes infrequently, say every 5+ minutes or so). — Kevin Suchlicki, Jun 21 '13 at 09:18

Is it possible to do data paging in SQL Server (2012) in constant time?

1 Answers1