1

I have 2 tables: tbl_Token (~100 million rows) & tbl_EntryTokenSummary (~3.75 billion rows)

I'm trying to understand why the following 2 scenarios result in different 'net' plans. I don't really understand why Scenario 1 even references the non-clustered index, it shouldn't be relevant.

I think it's save to say the performance is a result of the 1418 scans, but why are they done?

/********************************************************************
* Scenario 1
********************************************************************/
Select top 100 
    *
From (
    Select 
        EntryId, FirstOccurence 
    From tbl_EntryTokenSummary 
    Where TokenId = (
        Select Id 
        From tbl_Token 
        Where TSig = dbo.GetTokenSignature('water') and Text = 'water') 
    and EntryTypeId in (1,2)
    ) a
Inner Join (
    Select 
        EntryId, FirstOccurence 
    From tbl_EntryTokenSummary 
    Where TokenId = (
        Select Id 
        From tbl_Token 
        Where TSig = dbo.GetTokenSignature('pipe') and Text = 'pipe') 
    and EntryTypeId in (1,2)
    ) b on a.EntryId = b.EntryId

This results in the following messages output:

2018-04-18 22:15:40.5925859

    (100 row(s) affected)
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'tbl_EntryTokenSummary'. Scan count 1418, logical reads 20419, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'tbl_Token'. Scan count 11, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

2018-04-18 22:15:43.9677926

And this plan: QueryPlan Scenario 1

However if I break out the TokenId lookups, the results change significantly:

/********************************************************************
* Scenario 2
********************************************************************/

Declare @t1 int = (Select Id From tbl_Token Where TSig = dbo.GetTokenSignature('water') and Text = 'water')
print sysdatetime()

Declare @t2 int = (Select Id From tbl_Token Where TSig = dbo.GetTokenSignature('pipe') and Text = 'pipe')
print sysdatetime()

Select top 100 
    ets1.EntryId, ets1.FirstOccurence, ets2.EntryId, ets2.FirstOccurence
From tbl_EntryTokenSummary ets1
Inner Join tbl_EntryTokenSummary ets2 On ets2.TokenId = @t2 and ets1.EntryId = ets2.EntryId 
Where ets1.TokenId = @t1

resulting in much better output:

2018-04-18 22:15:43.9677926

    Table 'tbl_Token'. Scan count 6, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

2018-04-18 22:15:43.9677926

    Table 'tbl_Token'. Scan count 5, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

2018-04-18 22:15:43.9677926

    (100 row(s) affected)
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'tbl_EntryTokenSummary'. Scan count 2, logical reads 1019, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

2018-04-18 22:15:44.1084285

QueryPlan Scenario 2

EDIT:

While reviewing the table set up, I realized I had not set the primary key, only a partitioned clustered index (which was actually unique, but not marked as unique).

Just adding the Primary Key actually made things worse, but once I dropped all indexes and added back the Primary Key (as partitioned/clustered), the query optimizer calculated a more correct plan did the exact same thing.

Original query plan: https://www.brentozar.com/pastetheplan/?id=Hku0KKH3f

Updated query plan (after Primary Key): https://www.brentozar.com/pastetheplan/?id=HJu3kwo2f

Side Note: I was directed to this link as an explanation of that plan that was generated: https://sqlperformance.com/2014/01/sql-plan/starjoininfo-in-execution-plans

  • well... yes and no. I get that different queries form different plans, I'm trying to understand why the first scenario seems to unnecessarily involve the non-clustered index. I'd rather not break the query out using variables, but the performance difference is unacceptable at this point. – Corey Aldebol Apr 19 '18 at 03:25
  • is this SQL Server - can you tag appropriately? – Nick.Mc Apr 19 '18 at 03:32
  • Yes, and done... – Corey Aldebol Apr 19 '18 at 03:37
  • When you don't paramertize it, a plan is created for each set of literals. Check out Brent Ozar and other folks posts on parameter sniffing, local variables, and literals – S3S Apr 19 '18 at 03:42
  • Right... but neither scenario is 'parameterized', so that's not really a difference in the scenarios – Corey Aldebol Apr 19 '18 at 03:56
  • Scenario 2 is parameterised – Nick.Mc Apr 19 '18 at 13:37
  • @Nick.McDermaid I disagree. I am using variables, but scenario 2 is the combination of all 3 statements. There are no input parameters to the scenario, so it is in the same starting position as scenario 1. – Corey Aldebol Apr 20 '18 at 13:00
  • In the first query, what happens if you use `SELECT MAX(Id)` instead of `SELECT Id`? In the top query, you're using a table result. In the second query you're using a scalar. Perhaps that is confusing it. – Nick.Mc Apr 21 '18 at 23:42

0 Answers0