I've been researching a solution to a performance problem with a system I'm responsible for, and I think at least part of the problem is due to database query performance. We use stored procedures to query "pages" of data in a pretty standard way. However, this paging appears to be more costly when the datasets get large.
Given this simple table populated with sample data:
create table Data (
Value uniqueidentifier not null,
constraint PK_Data primary key clustered (Value)
)
insert into Data
-- SeedTable has ~2M rows
select newid() from SeedTable
And this stored procedure to return paged data: (this requires Sql2012 apparently, though the Sql2008 style of using ROW_NUMBER() behaves the same):
create proc
GetDataPage @Offset int, @Count int
as
select Value
from Data
order by Value
offset @Offset rows
fetch next @Count rows only
I then test the performance of this sproc with this C# code:
const int PageSize = 50;
const int MaxCount = 50000;
using (var conn = new SqlConnection("Data Source=.;Initial Catalog=TestDB;Integrated Security=true;")) {
conn.Open();
int a = 0;
for (int i = 0; ; i += PageSize) {
using (var cmd = conn.CreateCommand()) {
cmd.CommandType = System.Data.CommandType.StoredProcedure;
cmd.CommandText = "GetDataPage";
var oid = cmd.CreateParameter();
var offset = cmd.CreateParameter();
offset.Value = i;
offset.ParameterName = "Offset";
cmd.Parameters.Add(offset);
var count = cmd.CreateParameter();
count.Value = PageSize;
count.ParameterName = "Count";
cmd.Parameters.Add(count);
var sw = Stopwatch.StartNew();
int c = 0;
using(var reader = cmd.ExecuteReader()) {
while (reader.Read()) {
c++;
}
}
a += c;
sw.Stop();
Console.WriteLine(sw.ElapsedTicks + "\t" + a);
if (c < PageSize || a >= MaxCount)
break;
}
}
}
When I chart the output of this code I get the following:
I would have expected that paging like this in SQL would have constant time performance, or perhaps logarithmic at worst, but it is pretty clear from the chart that performance is linear.
Are there any special tricks (hints) to make this work better?
Is there another approach to this that might be faster?
Do other databases behave the same way?
Changing the experimental code to use the "page from" technique, that Kevin Suchlicki suggests, results in the following:
Very impressive. This performance looks more like what I would expect/want. Now I just need to figure out if I can apply this to my real problem. The potential issue being that it doesn't allow for "random access" of the data, but a forward only cursor-like access. I'm aware that it must look like what I'm doing violates every notion of good database design.