2

On our SQL Server database, we have a table that is partitioned on a date column. For each date, we inserted half a million records. We noticed that the query performance was good for dates that are closer to the partition start range and it gradually degraded for each date moving forward.

Here is my partition function

CREATE PARTITION FUNCTION partition_position_monthly (DATE)
AS RANGE RIGHT FOR VALUES ('2019-09-01', '2019-11-01');

Table

CREATE TABLE PartitionTest(
    ID int IDENTITY(1,1) NOT NULL,
    col1 varchar(256) ,
    col2 varchar(128) ,
    col3 varchar(128) ,
    BusinessDate date , -- partition column
) ON partition_scheme_monthly(BusinessDate)

There is a clustered index on the BusinessDate column.

Here is the query used

select top 1000 * from PartitionTest where BusinessDate = ?

CPU and IO recordings for each business date

BusinessDate = 2019-09-01 CPU time = 31 ms Scan count 1, logical reads 80, physical reads 0,, read-ahead reads 0

BusinessDate = 2019-09-02 CPU time = 63 ms Scan count 1, logical reads 24905, physical reads 0, read-ahead reads 3131

BusinessDate = 2019-09-03 CPU time = 125 ms Scan count 1, logical reads 49727, physical reads 0, read-ahead reads 7

BusinessDate = 2019-09-04 CPU time = 172 ms Scan count 1, logical reads 74551, physical reads 0, read-ahead reads 7

BusinessDate = 2019-09-05 CPU time = 234 ms Scan count 1, logical reads 99376, physical reads 0, read-ahead reads 117

As you can see the CPU time and logical reads gradually kept on increasing for BusinessDates that are farther from the partition start range.

Is this the expected behaviour while fetching data from a partition?

We plan to do partition on monthly data, and the query response time for days that are towards the end of the month is beyond our acceptable limits. Is there a way to achieve constant CPU time and logical reads for each day in the partition?

PasteThePlan Link

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
suman M A
  • 23
  • 5
  • 1
    Partitions are a data management feature, not a performance feature. Performance degrades if a query has to search multiple partitions. In this case though are you *really* trying to read 500K rows at once? Does that mean that one month contains 15M rows? With such low selectivity the `BusinessDate` index isn't very useful – Panagiotis Kanavos Oct 25 '19 at 12:06
  • This test reads only the top 1000 rows at once for a given date. Our query always fetches data for a single business date, so we wont be searching for data across partitions. Whats weird is the number of logical reads keep increasing for each succeeding day. Doesn't sql server have random access facility to read data for a date within the partition. Why does it have to sequentially start from the beginning of the file? – suman M A Oct 25 '19 at 12:26
  • Top 1000 based on what order? If only `BusinessDate` is used as a clustered key, it means the rest of the rows are in random order. With 500K rows per date, the server may well decide to use a parallelized execution plan - btw you didn't post the exeuction plan. In any case, logical reads are *logical* - they read the cache, not the file. It's almost as if the BusinessDate index isn't used at all - or the server decided that since it has to look for 1/60th of the data, it's faster to do a scan than a seek – Panagiotis Kanavos Oct 25 '19 at 12:27
  • 1
    Upload your actual plan to https://www.brentozar.com/pastetheplan/. – Dan Guzman Oct 25 '19 at 12:38
  • Please post the execution plan. I bet the bad selectivity, `select *` and the data size results in a parallelized table scan since the index doesn't contain all fields, so using it would be more expensive. That will be evident in the execution plan. It would probably be better if the clustered index was the ID, with an extra index for Date. You could also use a Clustered Columnstore index, which essentially indexes *and* compresses the entire table. – Panagiotis Kanavos Oct 25 '19 at 12:38
  • https://www.brentozar.com/pastetheplan/?id=rk3eMdgcH Heres the execution plan – suman M A Oct 25 '19 at 13:01
  • The query tries to access rows over the clustered index so this operation will costly for optimizer because it touch every rows of the table. So you can add non-clustered index into BusinessDate column, so the selectivity will improve and you can gain more performance However the main question is who needs all columns pf the PartitionTest table , if you want more performance you need cover index so you need to select actual required columns and add them to include column area. You can see the following link ;https://stackoverflow.com/questions/62137/what-is-a-covered-index – Esat Erkec Oct 25 '19 at 13:26

1 Answers1

0

Yes this is expected.

To get equal/predictable performance for all dates you would either need an index with leading column BusinessDate or change your partition function to be more granular (daily instead of monthly)

Without this the the best it can do is find the right partition for the date and scan all rows until it finds 1000 matching the date predicate. In your execution plan it reads the rows within the partition in clustered index key order and needs to read 2,001,000 before finding the first thousand matching the predicate on BusinessDate='2019-10-27'.

If you find that later dates are slower likely they are correlated with your clustered index key (i.e. rows ordered later by clustered index key also tend to have later dates).

Martin Smith
  • 438,706
  • 87
  • 741
  • 845