1

I have a method that reads data using SqlDataReader and yield returns an IEnumerable, e.g.:

IEnumerable<string> LoadCustomers()
{
 using(SqlDataReader rdr = cmd.ExecuteReader())
 {
    while (rdr.Read())
    {
        yield return rdr.GetString(0); 
    }
 }
}

Now let's assume I need only newest 10 customers. I could do

LoadCustomers.Take(10)

or pass 10 as a parameter to sql and make my sql

SELECT TOP 10 FROM Customers ORDER BY CreationDate DESC

According to this post the entire results set are being transmitted from sql server to the client even if the datareader reads only few rows (as long as the connection is open) - should I avoid the Take(10) approach because of that extra data being transmitted anyway to client or it would be a premature optimization to avoid it (because the yield return code would close the connection after it read 10 rows and then the data transmission would stop anyway)?

Community
  • 1
  • 1
BornToCode
  • 9,495
  • 9
  • 66
  • 83
  • 2
    `ORDER BY CreationDate DESC`? – jarlh Jul 27 '16 at 09:27
  • 4
    It would not be premature optimization. Take from the database only what you actually need. It makes no sense to select 10,000 customers when you only need 10. – Zohar Peled Jul 27 '16 at 09:29
  • 1
    You're misinterpreting what that post says. The entire result set is *not* transmitted to the client if you stop reading, though some rows may already be buffered. `SqlDataReader` does not "read ahead" beyond the network. The reason you still want to send `TOP(10)` to the database server in most cases, and why that's not a premature optimization, is because the optimizer can produce a more effective plan if it knows you only want 10 rows as opposed to reading the whole table (if nothing else, the query will allocate less memory in advance). – Jeroen Mostert Jul 27 '16 at 11:12
  • Another way to think of it: a premature optimization is when you start changing code that correctly expresses what you want in an attempt to make it faster, before even having established it's necessary. Using `SELECT` without a `TOP` even though you want only a limited number of rows *isn't* correctly expressing what you want from the point of view of the database. Which is kind of a big thing in SQL, since your declarations are the only thing the server has to produce query plans with. Taken to the extreme, it would mean all your queries become `SELECT * FROM [Table]`. – Jeroen Mostert Jul 27 '16 at 11:23
  • @JeroenMostert - But if I don't stop early enough (even though I stop sometime) the entire result set could be already buffered on the client, isn't it? +1 for the clarification about the more effective plan if I specify top 10 in the query. – BornToCode Jul 27 '16 at 11:46
  • 1
    It's theoretically possible for the entire result set to be buffered on the client side, yes. But that requires pretty big network buffers -- your average packet can't hold more than a few rows of your average table. More than 10, probably, but not, say, 10,000. There's no infinite readahead/buffering. `SqlDataReader` is not `DataTable` -- no rows are actually read until you ask for them, it's just that the first row will also pull along the next X. It's true that that too contributes to the inefficiency, since the server still retrieves and sends these rows. – Jeroen Mostert Jul 27 '16 at 11:59
  • @JeroenMostert - If you refactor your comments into an answer, I'll choose it as the accepted answer. – BornToCode Jul 28 '16 at 15:18

2 Answers2

2

Since whether or not optimization is "premature" is subjective, I choose to interpret this question as "does using a DataReader and stopping the read after 10 rows have the same performance characteristics as using TOP(10) in the query?"

The answer is no. Passing TOP(10) to the server allows the optimizer to tune reads, memory grants, I/O buffers, lock granularity and parallelism with the knowledge that the query will return (and in this case, also read) at most 10 rows. Leaving out the TOP means it has to prepare for the case where the client is going to be reading all rows -- regardless of whether you actually stop earlier.

It isn't true that the server will send rows whether you read them or not. Pulling rows with a SqlDataReader is conceptually a row-by-row operation: when you issue Reader.MoveNext, you fetch the next row from the server and only that row. But in the interest of performance, rows are buffered before you request them (both on the server end an in the network buffers). So it is possible to end up with, say, 100 rows retrieved in buffers after your first .MoveNext call, even if you only read 10 of them.

With regards to overhead, this would not be my primary concern because these buffers ultimately have a fixed size: the server will not go and buffer all rows of a result set regardless of how many there are (this would be very inefficient in general). If you only read 10 rows, whether your query would ultimately return 1,000 or 1,000,000 rows if it ran to completion isn't going to matter in terms of buffering, but primarily in terms of the query plan. Nevertheless, it does add to the overhead.

Jeroen Mostert
  • 27,176
  • 2
  • 52
  • 85
1

You could also make use pagination Skip(0) and Take(10) more flexibility.

SQL SERVER 2012

SELECT name,
       CreationDate        
  FROM customer
 ORDER BY
       CreationDate      
OFFSET @skip ROWS
FETCH NEXT @take ROWS ONLY;

SQL 2005 to 2008

SET @take = (@skip + @take)

;WITH customer_page_cte AS
(SELECT 
        name, 
        CreationDate,
        ROW_NUMBER() OVER (ORDER BY CreationDate desc) AS RowNumber
 FROM customer
)

SELECT name, 
       CreationDate
  FROM customer_page_cte
 WHERE RowNumber > @skip AND RowNumber <= @take

C# with sql 2012 - use stored procedure for command :)

var command = @"SELECT name,
                       CreationDate        
                  FROM customer
                 ORDER BY
                       CreationDate      
                 OFFSET @skip ROWS
                 FETCH NEXT @take ROWS ONLY;";

            using (var conn = new SqlConnection("Data Source=.;Initial Catalog=Stackoverflow;Integrated Security=True"))
            {
                conn.Open();
                using (var cmd = new SqlCommand(command, conn))
                {
                    cmd.Parameters.AddWithValue("skip", 0);
                    cmd.Parameters.AddWithValue("take", 10);

                    var reader = cmd.ExecuteReader();
                    while (reader.Read())
                    {
                        Console.WriteLine(reader.GetString(0));
                    }
                }
            }
Harminder
  • 2,209
  • 23
  • 20