Why are logical reads for windowed aggregate functions so high?

Question

I've found that in execution plans using common subexpression spools that the reported logical reads get quite high for large tables.

After some trial and error I've found a formula that seems to hold for the test script and execution plan below. Worktable logical reads = 1 + NumberOfRows * 2 + NumberOfGroups * 4

I don't understand the reason why this formula holds though. It is more than I would have thought was necessary looking at the plan. Can anyone give a blow by blow account of what's going on that accounts for this?

Or failing that is there any way of tracing what page was read in each logical read so I can work it out for myself?

SET STATISTICS IO OFF; SET NOCOUNT ON;

IF Object_id('tempdb..#Orders') IS NOT NULL
  DROP TABLE #Orders;

CREATE TABLE #Orders
  (
     OrderID    INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
     CustomerID NCHAR(5) NULL,
     Freight    MONEY NULL,
  );

CREATE NONCLUSTERED INDEX ix
  ON #Orders (CustomerID)
  INCLUDE (Freight);

INSERT INTO #Orders
VALUES (N'ALFKI', 29.46), 
       (N'ALFKI', 61.02), 
       (N'ALFKI', 23.94), 
       (N'ANATR', 39.92), 
       (N'ANTON', 22.00);

SELECT PredictedWorktableLogicalReads = 
        1 + 2 * Count(*) + 4 * Count(DISTINCT CustomerID)
FROM   #Orders;

SET STATISTICS IO ON;

SELECT OrderID,
       Freight,
       Avg(Freight) OVER (PARTITION BY CustomerID) AS Avg_Freight
FROM   #Orders;

Output

PredictedWorktableLogicalReads
------------------------------
23

Table 'Worktable'. Scan count 3, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Orders___________000000000002'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Additional Info:

There is a good explanation of these spools in Chapter 3 of the Query Tuning and Optimization Book and this blog post by Paul White.

In summary the segment iterator at the top of the plan adds a flag to the rows it sends indicating when it is the start of a new partition. The primary segment spool gets a row at a time from the segment iterator and inserts it into a work table in tempdb. Once it gets the flag saying that a new group has started it returns a row to the top input of the nested loops operator. This causes the stream aggregate to be invoked over the rows in the work table, the average is computed then this value is joined back with the rows in the work table before the work table is truncated ready for the new group. The segment spool emits a dummy row in order to get the final group processed.

As far as I understand the worktable is a heap (or it would be denoted in the plan as an index spool). However when I try and replicate the same process it only needs 11 logical reads.

CREATE TABLE #WorkTable
  (
     OrderID    INT,
     CustomerID NCHAR(5) NULL,
     Freight    MONEY NULL,
  )

DECLARE @Average MONEY

PRINT 'Insert 3 Rows'

INSERT INTO #WorkTable
VALUES      (1, N'ALFKI', 29.46) /*Scan count 0, logical reads 1*/

INSERT INTO #WorkTable
VALUES      (2, N'ALFKI', 61.02) /*Scan count 0, logical reads 1*/

INSERT INTO #WorkTable
VALUES      (3, N'ALFKI', 23.94) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM   #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'

/*This convoluted query is just to force a nested loops plan*/
SELECT *
FROM   (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
       OUTER APPLY #WorkTable
WHERE  COALESCE(Freight, OrderID) IS NOT NULL
       AND @Average IS NOT NULL

PRINT 'Clear out work table'

TRUNCATE TABLE #WorkTable

PRINT 'Insert 1 Row'

INSERT INTO #WorkTable
VALUES      (4, N'ANATR', 39.92) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM   #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'

SELECT *
FROM   (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
       OUTER APPLY #WorkTable
WHERE  COALESCE(Freight, OrderID) IS NOT NULL
       AND @Average IS NOT NULL

PRINT 'Clear out work table'

TRUNCATE TABLE #WorkTable

PRINT 'Insert 1 Row'

INSERT INTO #WorkTable
VALUES      (5, N'ANTON', 22.00) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM   #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'

SELECT *
FROM   (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
       OUTER APPLY #WorkTable
WHERE  COALESCE(Freight, OrderID) IS NOT NULL
       AND @Average IS NOT NULL

PRINT 'Clear out work table'

TRUNCATE TABLE #WorkTable

PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM   #WorkTable /*Scan count 1, logical reads 0*/
PRINT 'Return Rows - With the average column included'

SELECT *
FROM   (SELECT @Average AS Avg_Freight) T
       OUTER APPLY #WorkTable
WHERE  COALESCE(Freight, OrderID) IS NOT NULL
       AND @Average IS NOT NULL

DROP TABLE #WorkTable

Is there any performance difference when we create indexes for Temp tables? — RGS, Apr 07 '16 at 17:52

score 23 · Accepted Answer · answered Mar 04 '11 at 14:18

Logical reads are counted differently for worktables: there is one 'logical read' per row read. This does not mean that worktables are somehow less efficient than a 'real' spool table (quite the reverse); the logical reads are just in different units.

I believe the thinking was that counting hashed pages for worktable logical reads would not be very useful because these structures are internal to the server. Reporting rows spooled in the logical reads counter makes the number more meaningful for analysis purposes.

This insight should make the reason your formula works clear. The two secondary spools are fully read twice (2 * COUNT(*)), and the primary spool emits (number of group values + 1) rows as explained in my blog entry, giving the (COUNT(DISTINCT CustomerID) + 1) component. The plus one is for the extra row emitted by the primary spool to indicate the final group has ended.

Paul

Ah, That does indeed explain things. Thanks very much for taking the time to answer this as it puzzled me for a few months! — Martin Smith, Mar 04 '11 at 14:31
@MartinSmith & SQLkiwi -- is there enough here for derivative question (I'll post one if you think there's enough meat)-- How do you compare two equivalent queries for efficient where one uses windowed functions one does not? Reads/writes/cpu in profiler & execution plan are my go to, but this complicates the matter. — EBarr, Sep 27 '12 at 16:05

score 0 · Answer 2 · answered Nov 20 '10 at 13:14

0

In the formula you give the NumberOfRows * 2 would hold true because of the Sort function and Stream Aggregate show in your execution diagram both need all the rows to complete processing. Can you comfirm a decrease of logical reads when a "where" clause is added for:

value of freight
CustomerID

answered Nov 20 '10 at 13:14

RC_Cleland

2,274
14
16

Have you seen any physicial reads for this query or during your development of the formula? Are you able to use DBCC DROPCLEANBUFFERS and capture the physical and logical reads again? Could you add a WHERE clause to the querey and rerun? Thanks – RC_Cleland Nov 20 '10 at 20:52

Why are logical reads for windowed aggregate functions so high?

2 Answers2

Linked