6

I'm finding that, on an indexed view with appropriate indexes, MAX(date) is performing an entire index scan followed by a stream aggregate whereas TOP (1) date is optimally using the index and only scanning a single row. For large numbers of rows this is leading to serious performance issues. I've included some code to demonstrate the issue below but would be interested to know if others can explain why this behaviour is occuring (it does not occur on a table with similar index) and whether it is a bug in SQL Server's optimiser (I've tested on both 2008 SP2 and on R2, and both show the same issues).

CREATE TABLE dbo.TableWithDate
(
  id INT IDENTITY(1,1) PRIMARY KEY,
  theDate DATE NOT NULL
);

CREATE NONCLUSTERED INDEX [ix_date] ON dbo.TableWithDate([theDate] DESC);

INSERT INTO dbo.TableWithDate(theDate) VALUES('1 MAR 2010'),('1 MAR 2010'), ('3 JUN 2008');

-- Test 1:  max vs top(1) on the table.  They give same optimal plan (scan one row from the index, since index is in order)
SELECT TOP(1) theDate FROM dbo.TableWithDate ORDER BY theDate DESC;
SELECT MAX(theDate) FROM dbo.TableWithDate;

CREATE TABLE dbo.TheJoinTable
(
  identId INT IDENTITY(1,1) PRIMARY KEY,
  foreignId INT NOT NULL,
  someValue INT NOT NULL
);

CREATE NONCLUSTERED INDEX [ix_foreignValue] ON dbo.TheJoinTable([foreignId] ASC);

INSERT INTO dbo.TheJoinTable(foreignId,someValue) VALUES (1,10),(1,20),(1,30),(2,5),(3,6),(3,10);

GO

CREATE VIEW dbo.TheTablesJoined 
WITH SCHEMABINDING
AS 
  SELECT T2.identId, T1.id, T1.theDate, T2.someValue
  FROM dbo.TableWithDate AS T1
  INNER JOIN dbo.TheJoinTable AS T2 ON T2.foreignId=T1.id
GO

-- Notice the different plans:  the TOP one does a scan of 1 row from each and joins
-- The max one does a scan of the entire index and then does seek operations for each item (less efficient)
SELECT TOP(1) theDate FROM dbo.TheTablesJoined ORDER BY theDate DESC;

SELECT MAX(theDate) FROM dbo.TheTablesJoined;

-- But what about if we put an index on the view?  Does that make a difference?
CREATE UNIQUE CLUSTERED INDEX [ix_clust1] ON dbo.TheTablesJoined([identId] ASC);
CREATE NONCLUSTERED INDEX [ix_dateDesc] ON dbo.TheTablesJoined ([theDate] DESC);

-- No!!!! We are still scanning the entire index (look at the actual number of rows) in the MAX case.
SELECT TOP(1) theDate FROM dbo.TheTablesJoined ORDER BY theDate DESC;

SELECT MAX(theDate) FROM dbo.TheTablesJoined;
Paul McLoughlin
  • 2,283
  • 1
  • 15
  • 15
  • 2
    This sounds more like a feature that hasn't been implemented yet in the optimizer rather than a bug. What if you use the `NOEXPAND` table hint? – Gabe Dec 20 '10 at 07:38
  • Just out of curiosity, what happens if you declare a foreign keys between the tables? – Ronnis Dec 20 '10 at 14:34
  • Makes no difference if I add a foreign key constraint (I should have mentioned that I'd tried that). However, adding WITH (NOEXPAND) to the query of the indexed view does force it to choose a different (optimal) plan. Thanks for recommending that I try that. – Paul McLoughlin Dec 20 '10 at 15:19

3 Answers3

2

John Sansom covered the performance characteristics of MAX vs. TOP, however his results didn't specifically answer your question.

I think the answer lies in the fact that MAX is a general purpose aggregate function geared toward crunching pages and pages of data, where TOP is an operator geared toward only restricting the number of rows being fetched.

In this one narrow use case, both example queries are capable of going after the same thing, and can return the same result. The query using TOP is benefiting from the specific optimizations afforded by using that method for this use case.

I dumped out the XML plans for both queries, and the statement using MAX contained:

<DefinedValues>
  <DefinedValue>
    <ColumnReference Column="Expr1004" />
    <ScalarOperator ScalarString="MAX([db].[dbo].[TheTablesJoined].[theDate])">
      <Aggregate AggType="MAX" Distinct="false">
        <ScalarOperator>
          <Identifier>
            <ColumnReference Database="[db]" Schema="[dbo]" Table="[TheTablesJoined]" Column="theDate" />
          </Identifier>
        </ScalarOperator>
      </Aggregate>
    </ScalarOperator>
  </DefinedValue>
</DefinedValues>

The statement using TOP contained this in place of the XML defining what was being aggregated in the MAX query:

<TopExpression>
  <ScalarOperator ScalarString="(1)">
    <Const ConstValue="(1)" />
  </ScalarOperator>
</TopExpression>

There's a lot less going on in the execution plan when using TOP.

arcain
  • 14,920
  • 6
  • 55
  • 75
1

To evaluate the value of any aggregate function like max, all the rows in a table must be read and because one of its values is used in the evaluation. Top 1 only has to read one row, this can be done very quick when it is not forced by order by and no suittable index to scan the whole table. In those cases you can create a suitable index to increase performance.

bernd_k
  • 11,558
  • 7
  • 45
  • 64
  • "Top 1 only has to read one row" - are you sure about that? If we have a TOP with ORDER BY (99% of cases), im pretty sure the engine must read all rows to order them, and then perform the TOP. Or am i wrong... – RPM1984 Dec 20 '10 at 07:26
  • 4
    RPM1984: The optimizer knows sees that the `ORDER BY` clause matches the order of the index so it knows that it doesn't have to sort, meaning it doesn't have to read all the rows. – Gabe Dec 20 '10 at 07:39
  • @RPM 99% that is very good. I often see Top used without order by (a fault). But In your case just add the required indexes. – bernd_k Dec 20 '10 at 08:05
0

What edition of SQL Server? Only Enterprise and Developer will use indexed views automatically, other editions will expand the query to go against the underlying tables.

You want to specify the NOEXPAND query hint. See the answer for How can i speed up this Indexed View?

Community
  • 1
  • 1
sisve
  • 19,501
  • 3
  • 53
  • 95