0

I have a database that contains 250,000 records. I am using a DataReader to loop the records and export to a file. Just looping the records with a DataReader and no WHERE conditions is taking approx 22 minutes. I am only selecting two columns (the id and a nvarchar(max) column with about 1000 characters in it).

Does 22 minutes sound correct for SQL Server Express? Would the 1GB of RAM or 1CPU have an impact on this?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
econner
  • 707
  • 2
  • 8
  • 14
  • More testing - installed full version of SQLServer 2008 R2 on same machine with same database. The DataReader looped the 250,000 records in 4.3 minutes vs the 22 minutes with SQLExpress. – econner Jan 02 '12 at 07:04
  • You say you're only accessing ~1k characters, but how large is the actual table? Run `exec sp_spaceused myTable` (replace `myTable` with your table name). The Max size for a single record of `NVARCHAR(MAX)` is quite large, and since you won't / can't have in index on the `NVARCHAR` field you will be requesting the entire row, so if there's another column that is say 10KB per row, your 250k rows are actually 2.5GB etc which means it can't all fit in RAM. – Seph Jan 04 '12 at 10:39
  • rows = 255,000. reserved = 1994320 KB, data = 1911088 KB, index_size = 82752 KB, unused 480KB – econner Jan 04 '12 at 17:37

2 Answers2

0

22 minutes sounds way too long for a single basic (non-aggregating) SELECT against 250K records (even 22 seconds sounds awfully long for that to me).

To say why, it would help if you could post some code and your schema definition. Do you have any triggers configured?

With 1K characters in each record (2KB), 250K records (500MB) should fit within SQL Express' 1GB limit, so memory shouldn't be an issue for that query alone.

Possible causes of the performance problems you're seeing include:

  • Contention from other applications
  • Having rows that are much wider than just the two columns you mentioned
  • Excessive on-disk fragmentation of either the table or the DB MDF file
  • A slow network connection between your app and the DB

Update: I did a quick test. On my machine, reading 250K 2KB rows with a SqlDataReader takes under 1 second.

First, create test table with 256K rows (this only took about 30 seconds):

CREATE TABLE dbo.data (num int PRIMARY KEY, val nvarchar(max))
GO
DECLARE @txt nvarchar(max)
SET @txt = N'put 1000 characters here....'
INSERT dbo.data VALUES (1, @txt);
GO 
INSERT dbo.data 
    SELECT num + (SELECT COUNT(*) FROM dbo.data), val FROM dbo.data 
GO 18

Test web page to read data and display the statistics:

using System;
using System.Collections;
using System.Data.SqlClient;
using System.Text;

public partial class pages_default
{
    protected override void OnLoad(EventArgs e)
    {
        base.OnLoad(e);
        using (SqlConnection conn = new SqlConnection(DAL.ConnectionString))
        {
            using (SqlCommand cmd = new SqlCommand("SELECT num, val FROM dbo.data", conn))
            {
                conn.Open();
                conn.StatisticsEnabled = true;
                using (SqlDataReader reader = cmd.ExecuteReader())
                {
                    while (reader.Read())
                    {
                    }
                }
                StringBuilder result = new StringBuilder();
                IDictionary stats = conn.RetrieveStatistics();
                foreach (string key in stats.Keys)
                {
                    result.Append(key);
                    result.Append(" = ");
                    result.Append(stats[key]);
                    result.Append("<br/>");
                }
                this.info.Text = result.ToString();
            }
        }
    }
}

Results (ExecutionTime in milliseconds):

IduRows = 0
Prepares = 0
PreparedExecs = 0
ConnectionTime = 930
SelectCount = 1
Transactions = 0
BytesSent = 88
NetworkServerTime = 0
SumResultSets = 1
BuffersReceived = 66324
BytesReceived = 530586745
UnpreparedExecs = 1
ServerRoundtrips = 1
IduCount = 0
BuffersSent = 1
ExecutionTime = 893
SelectRows = 262144
CursorOpens = 0

I repeated the test with SQL Enterprise and SQL Express, with similar results.

Capturing the "val" element from each row increased ExecutionTime to 4093 ms (string val = (string)reader["val"];). Using DataTable.Load(reader) took about 4600 ms.

Running the same query in SSMS took about 8 seconds to capture all 256K rows.

RickNZ
  • 18,448
  • 3
  • 51
  • 66
  • 2
    +1, but please put the `SqlDataReader` into a `using` block as well. – John Saunders Jan 02 '12 at 08:02
  • what results do you get if your read from a field in the datareader loop? for example, string test = reader["val"].ToString(); – econner Jan 03 '12 at 01:31
  • Reading the "val" field increases ExecutionTime to 4093 ms. Running the query from SSMS, including capturing all result rows, takes about 8 seconds. – RickNZ Jan 03 '12 at 02:49
  • OK, for the pedants out there, I put the SqlDataReader in a using block--I'm a *big* "user of usings," but hey, this is just a one-time benchmark, not production code! – RickNZ Jan 03 '12 at 03:06
  • @econner do you have the DB on the same machine as your app? The query is returning 500MB+ -- if you have a slow network connection, that could be contributing to your performance issues. – RickNZ Jan 03 '12 at 03:11
  • More testing ... I restored a backup to both the SQLExpress and the full version to ensure the same database was being used for testing. I then used the the Microsoft SQL Server Management Studio to run a query of "SELECT id, val FROM Table1" against the SQLExpress version and also a the full version. The full version returned completed in 18 seconds. However the same query in Studio with the SQLExpress server took 10 minutes 20 seconds to complete. – econner Jan 03 '12 at 20:19
  • I am using SQL Server 2008 R2 SQLExpress and full version (180 day trial). The computer is 64bit, AMD Athlon II X2 220 Processor 2.80GH with 8GB of RAM with Windows 7 Professional SP1 – econner Jan 03 '12 at 20:19
0

Your results from running exec sp_spaceused myTable provide a potential hint:

rows = 255,000
reserved = 1994320 KB
data = 1911088 KB
index_size = 82752 KB
unused 480KB

The important thing to note here is reserved = 1994320 KB meaning your table is some 1866 MB, when reading fields that are not indexed (since NVARCHAR(MAX) can not be indexed) SQL Server must read the entire row into memory before restricting the columns. Hence you're easily running past the 1GB RAM limit.

As a simple test delete the last (or first) 150k rows and try the query again see what performance you get.

A few questions:

  • Does your table have a clustered index on the primary key (is it the id field or something else)?
  • Are you sorting on a column that is not indexed such as the `nvarchar(max) field?

In the best scenario for you your PK is id and also a clustered index and you either have no order by or you are order by id:

Assuming your varchar(max) field is named comments:

SELECT id, comments
FROM myTable
ORDER BY id

This will work fine but it will require you to read all the rows into memory (but it will only do one parse over the table), since comments is VARCHAR(MAX) and cannot be indexed and table is 2GB SQL will then have to load the table into memory in parts.

Likely what is happening is you have something like this:

SELECT id, comments
FROM myTable
ORDER BY comment_date

Where comment_date is an additional field that is not indexed. The behaviour in this case would be that SQL would be unable to actually sort the rows all in memory and it will end up having to page the table in and out of memory several times likely causing the problem you are seeing.

A simple solution in this case is to add an index to comment_date.

But suppose that is not possible as you only have read access to the database, then another solution is make a local temp table of the data you want using the following:

DECLARE @T TABLE
(
id BIGINT,
comments NVARCHAR(MAX),
comment_date date
)

INSERT INTO @T SELECT id, comments, comment_date FROM myTable

SELECT id, comments
FROM @T
ORDER BY comment_date

If this doesn't help then additional information is required, can you please post your actual query along with your entire table definition and what the index is.

Beyond all of this run the following after you restore backups to rebuild indexes and statistics, you could just be suffering from corrupted statistics (which happens when you backup a fragmented database and then restore it to a new instance):

EXEC [sp_MSforeachtable] @command1="RAISERROR('UPDATE STATISTICS(''?'') ...',10,1) WITH NOWAIT UPDATE STATISTICS ? "

EXEC [sp_MSforeachtable] @command1="RAISERROR('DBCC DBREINDEX(''?'') ...',10,1) WITH NOWAIT DBCC DBREINDEX('?')"

EXEC [sp_MSforeachtable] @command1="RAISERROR('UPDATE STATISTICS(''?'') ...',10,1) WITH NOWAIT UPDATE STATISTICS ? "
Seph
  • 8,472
  • 10
  • 63
  • 94
  • The id column is type Guid and is set to Pk and is clustered for the Pk. At this time I am not setting a WHERE condition or ORDER BY condition but rather just doing a SELECT * and looping all 250,000 records. I am now running the 3 commands you provided. Also I will remove 5000 records from the table and see if it makes a difference and then post back the results – econner Jan 06 '12 at 05:35