SQL Server Execution plan when combining multiple tables

Question

I have a stored procedure that issues a query similar to the one below (pseudo-tsql).

Multiple ParentIds are passed in as a parameter (csv), parsed, and inserted into a table variable @i. For each ParentId passed in, we look up the StorageTable and include it in @i. Now, depending on the value of the StorageTable column, we need to fetch data out of the appropriate table (Table1, Table2, or Table3) by ParentId. There is no chance of duplicates across multiple tables - hence the UNION ALL.

When I examine the actual execution plan, I find that most of my cost/subtree cost (more than half) is spent on StorageTable's that were not even provided as inputs.

For example, if I included a StorageTable = 'Table1', Table2's Index Scan would show up with a high cost in the execution plan.

As I would expect, STATISTICS IO did not show any reads against Table2, but the data access points appear expensive according to the actual execution plan.

In my mind, if a particular StorageTable is not present, the inner join with @i will return an empty result set and "short circuit" any additional work, no?

What could be the solution?

DECLARE @i AS TABLE
(
    ParentId INT,
    StorageTable VARCHAR(10)
)

INSERT INTO @i...
INSERT INTO @i...
INSERT INTO @i...

SELECT Col1, Col2, Col3
FROM dbo.Table1 AS T1
INNER JOIN (SELECT * FROM @i WHERE StorageTable = 'Table1') AS I
    ON T1.ParentId = I.ParentId
<joins>
<where clause>

UNION ALL

SELECT Col1, Col2, Col3
FROM dbo.Table2 AS T2
INNER JOIN (SELECT * FROM @i WHERE StorageTable = 'Table2') AS I
    ON T2.ParentId = I.ParentId
<joins>
<where clause>

UNION ALL

SELECT Col1, Col2, Col3
FROM dbo.Table3 AS T3
INNER JOIN (SELECT * FROM @i WHERE StorageTable = 'Table3') AS I
    ON T3.ParentId = I.ParentId
<joins>
<where clause>

Your joins could also be written as: `INNER JOIN @i AS I ON T1.ParentId = I.ParentId AND I.StorageTable = 'Table1'`, so the subqueries are not necessary. Could you add an execution plan? — NickyvV, Jan 21 '14 at 12:32
The costs shown in the execution plan are not reliable in this sort of case. Even in the actual plan they are just the estimated costs so do not necessarily reflect actual runtime cost. — Martin Smith, Jan 21 '14 at 12:37
Please just post the (actual) plan. It could have any number of shapes. — usr, Jan 21 '14 at 12:39
Table variables have no statistics. Stats are highly relevant here to choose the right join operator. Even if you use a temp table with stats, the stats might get out of date. — usr, Jan 21 '14 at 12:41
@usr - Presumably a nested loops with `@i` as the outer table as `STATISTICS IO` doesn't show any reads against `Table2` — Martin Smith, Jan 21 '14 at 12:43
@MartinSmith right. Ok what's his problem then? I do not understand it. Seems to work fine. — usr, Jan 21 '14 at 12:44
@usr - I think they are just confused by the fact that these still show up as high costs in the plan (will estimate one row emitted from the table variable and one execution of the operation against `Table2` even though actual executions of the operator is 0) — Martin Smith, Jan 21 '14 at 12:45

Martin Smith · Accepted Answer · 2014-01-21T13:28:54.620

1

You should pretty much ignore the subtree costs in this case.

Even in the actual plan they are just based on estimates.

From what you say about STATISTICS IO output for example the actual number of executions of the operators accessing Table2 is 0.

However the plan will likely estimate Number of Executions = 1.

(You can see estimated and actual figures in the properties window in SSMS after selecting an operator)

If some branches of the plan have an under estimate of the number of executions you could try using a #temp table instead so the column statistics are taken into account.

You could get somewhat more representative subtree costs by adding some helper variables and OPTION (RECOMPILE) but still they are only as accurate as the modelling assumptions and the estimates.

For example

DECLARE @T TABLE(
  X            INT,
  StorageTable VARCHAR(50));

INSERT INTO @T
VALUES      (1, 'Table1')

DECLARE @Branch1Exists BIT = iif(EXISTS(SELECT * FROM @T WHERE StorageTable = 'Table1'), 1, 0)
DECLARE @Branch2Exists BIT = iif(EXISTS(SELECT * FROM @T WHERE StorageTable = 'Table2'), 1, 0)

SELECT X
FROM   @T
       JOIN master..spt_values V
         ON [@T].X = number
WHERE  @Branch1Exists = 1
UNION ALL
SELECT X
FROM   @T
       JOIN sys.objects
         ON [@T].X = object_id
WHERE  @Branch2Exists = 1
OPTION (recompile)

Removes at compile time the branch of the plan that isn't executed rather than showing costs for an estimated single execution.

edited Jan 21 '14 at 13:28

answered Jan 21 '14 at 12:56

Martin Smith

438,706
87
741
845

thanks for the reply. Perhaps STATISTICS IO is a better indicator of the actual activity in this case? As far as the execution plan details, Table2 shows an Estimated Operator Cost of 55%, Estimated Number of Executions = 5000 (I should have mentioned there was a Top(5000) on each TableX), Actual Executions = 0, Estimated Number of Rows = 1, Actual number Of Rows = 0. This seems to be an overestimate of the number of executions, right? I just don't want resources spent on Table2 if it doesn't apply. – John Russell Jan 21 '14 at 13:07
Ah I didn't notice you had other tables involved in ``. If it was just the table variable would have been estimated executions 1 unless you used `OPTION (RECOMPILE)`. But yes if `Actual Executions = 0` then cost is basically `0` too. – Martin Smith Jan 21 '14 at 13:11
That worked great and made my execution plan much more manageable. Thanks for sharing! – John Russell Jan 21 '14 at 13:49
For production code, do you recommend including the option (recompile) table hint in a case such as this so that a branch that's not applicable is excluded. The recompile option seems to be a debatable topic, but maybe it makes sense in a case like this? – John Russell Jan 23 '14 at 13:26
@JohnRussell - Based on the information in the question it doesn't sound of benefit here. Because it sounds as though at runtime the irrelevant branches don't really have any cost anyway. You could try both with and without the hint and see if it makes any improvement to `STATISTICS IO`/`STATISTICS TIME` output though. – Martin Smith Jan 23 '14 at 13:34

SQL Server Execution plan when combining multiple tables

1 Answers1