14

I have 3 versions of a query that ultimately returns the same results.

One of them gets dramatically slower when adding an extra inner join to a relatively small table AND where parameter variables are used within the where clause.

The execution plan is very different for the fast and slow queries (included below each query).

I want to understand why this happened and how to prevent it.

This query takes < 1 second. It does not have an extra inner join but it uses parameter variables in the where clause.

    declare @start datetime = '20120115'
    declare @end datetime = '20120116'

    select distinct sups.campaignid 
    from tblSupporterMainDetails sups
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between @start and @end

  |--Parallelism(Gather Streams)
       |--Sort(DISTINCT ORDER BY:([sups].[campaignID] ASC))
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
                 |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
                      |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
                           |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1004]))
                           |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
                           |         |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
                           |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
                                |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]),  WHERE:(PROBE([Bitmap1004],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))

This query takes < 1 second. It has an extra inner join BUT uses parameter constants in the where clause.

    select distinct camps.campaignid 
    from tblCampaigns camps
    inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between '20120115' and '20120116'

  |--Parallelism(Gather Streams)
       |--Hash Match(Right Semi Join, HASH:([sups].[campaignID])=([camps].[campaignID]))
            |--Bitmap(HASH:([sups].[campaignID]), DEFINE:([Bitmap1007]))
            |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
            |         |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
            |              |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
            |                   |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1006]))
            |                   |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
            |                   |         |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
            |                   |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
            |                        |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]),  WHERE:(PROBE([Bitmap1006],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([camps].[campaignID]))
                 |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]),  WHERE:(PROBE([Bitmap1007],[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID],N'[IN ROW]')))

This query takes 2 minutes. It has an extra inner join AND it uses parameter variables in the where clause.

    declare @start datetime = '20120115'
    declare @end datetime = '20120116'

    select distinct camps.campaignid 
    from tblCampaigns camps
    inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between @start and @end

  |--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
       |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
       |--Top(TOP EXPRESSION:((1)))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([calls].[callID], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
                 |    |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
                 |    |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterID_closingCall] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[AUTOGEN_PK_tblCallLogs] AS [calls]), SEEK:([calls].[callID]=[GOGEN].[dbo].[tblCallLogs].[callID] as [calls].[callID]),  WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) LOOKUP ORDERED FORWARD)

Notes:

  • I believe that the slowness is being caused by the Clustered Index Seek on tblCallLogs, however I don't know why SQL Server would choose this execution plan.
  • Should I be using a query optimiser hint? I have needed to and am reluctant to tell SQL Server how to do its job...
  • The problem seems to be caused by a combination of factors - an extra join AND variables.
  • Could the execution plan be trying to reuse a 'bad' plan when it finds variables for the query?
  • In real life I will have to use parameter variables. Constants are no good! So this problem could exist in many of my queries/stored procedures!
  • I have rebuilt indexes and updated statistics on tblCampaigns and tblSupporterMainDetails. This had no effect.
  • Both tables have clustered indexes on the primary key (identity integer).
  • The foreign key column campaignid is indexed.
  • All queries use the same parameter values - weather it be used as a variable or a constant.

Number of records in tables:

  • tblSupporterMainDetails = 12,561,900
  • tblCallLogs = 27,242,224
  • tblCampaigns = 756

UPDATE:

  • I have also rebuilt indexes and updated statistics on tblcalllogs. No effect.
  • I have cleared the execution plan cache using DBCC FREEPROCCACHE
  • tblCallLogs.callEnd is a datetime.

Schemas of involved columns:

tblCampaign.campaignid int not null
tblSupporterMainDetails.campaignid int not null
tblSupporterMainDetails.supporterid int not null
tblCallLogs.supporterid int not null
tblCallLogs.callEnd datetime not null

Indexes:

Indexes

UPDATE 2: After adding index to tblCallLogs.supporterId - with include column: callEnd
The 'slow' query speeded up to 40 seconds. the updated execution plan:

  |--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
   |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
   |--Top(TOP EXPRESSION:((1)))
        |--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
             |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
             |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterid_callend] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]),  WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) ORDERED FORWARD)

SOLUTION:

The extra join was not actually causing the problem directly, however it obviously changed the statement so that sql server held a different execution plan for it.
By adding OPTION(RECOMPILE) to the end of the slow statement, I was able get the expected fast performance. i.e. < 1 second. I am still unsure exactly this solution worked - why didnt flushing all plan work? is this a classic case of parameter sniffing? I will update this post as I learn the exact answer - or until someone can give a clear answer. Thanks to both @LievenKeersmaekers and @JNK for helping so far...

make_transition
  • 452
  • 1
  • 5
  • 12
  • 1
    from your explanation it seems you didn't update statistics on `tblCallLogs`?! – Lieven Keersmaekers Mar 28 '13 at 13:44
  • 1
    Can you show the schema for your tables? What datatype is `calls.callend`? – JNK Mar 28 '13 at 13:57
  • @LievenKeersmaekers - Correct, I had not done this originally as i thought the problem was sure to be between tblcampaigns and tblsupporterMainDetails. I have now done this but unfortunately it did not make a difference. – make_transition Mar 28 '13 at 14:14
  • 1
    I notice you have an index on `callend, supporterid`. Can you try creating a covering index the other way around on `supporterid, callend`? – Lieven Keersmaekers Mar 28 '13 at 14:26
  • @JNK - callEnd is a datetime - i am reluctant to show all columns for the tables, there are a lot! plus some are business sensitive and i believe have any relevance to the issue, although perhaps im being over cautious. Are there any particular columns that you are interested in e.g. those involved in fks/pks - or do you want to see the schema for fks/pks/indexes rather than columns? – make_transition Mar 28 '13 at 14:35
  • @Bakwon any column in the query in `SELECT`, `JOIN`, or `WHERE` – JNK Mar 28 '13 at 14:37
  • 1
    @Bakwon And is campaignID an indexed PK in campaign? – JNK Mar 28 '13 at 14:58
  • @LievenKeersmaekers - I have added the index you suggested. This has actually improved the performance of the query - the slow query now takes 40 seconds. I have included a screen shot of all the indexes on the tables involved. I fear that I may have too many indexes - although they seem logical to me, or have been suggested by SQL Server as the table has grown. – make_transition Mar 28 '13 at 14:58
  • @Bakwon - It's a start :). Can you post the updated execution plan? *(for all intents and purposes: having many indexes will not slow down selects, only inserts/updates/deletes)* – Lieven Keersmaekers Mar 28 '13 at 15:00
  • @JNK - Yes, campaignID is an indexed PK in campaign. I have uploaded a screen shot showing all indexes (just re-uploaded as it was missing indexes from tblCampaigns...) – make_transition Mar 28 '13 at 15:04
  • 1
    @Bakwon - You can read up on the reason **why** using parameters might result in other execution plans [here](http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/bd432d12-f296-4f94-9c33-5d2670c7fee7/) – Lieven Keersmaekers Mar 28 '13 at 15:13
  • @Bakwon is there a `TOP` in your query that you aren't including in the text you gave us? It's in the exec plan but normally that is either from an explicit `TOP` or an `EXISTS` subquery in your `WHERE` clause – JNK Mar 28 '13 at 15:14
  • @JNK - no, I am definitely not using TOP. I have included everything in the executed query. I saw that in the execution plan as well, but I assumed it was part of the internal sql server processing? – make_transition Mar 28 '13 at 15:23
  • @LievenKeersmaekers - I have come across parameter sniffing before. I was under the impression this was only something that happened with stored procedures. I normally get around the problem by assigning the parameter passed into the sp to a locally defined parameter. e.g. `declare @local_start datetime = @start; There are a few articles to read from the post you suggested... going through them now. – make_transition Mar 28 '13 at 15:38
  • @JNK - The TOP (1) expression shown in the execution plan is actually called 756 times - once for every campaign. Internally, it must be stepping through each campaign record, performing the nested loop for each one. – make_transition Mar 28 '13 at 15:49
  • @LievenKeersmaekers - I added OPTION(RECOMPILE) to the end of the slow select statement - bingo! its now super fast like the other 2 queries. I am still unsure why this worked - why didn't clearing the plan cache have the same effect? Out of interest, I tried add OPTION(RECOMPILE) to the existing fast queries - and they too gained a small speed (using client statistics) from average of 270ms to average of 230ms. – make_transition Mar 28 '13 at 16:20
  • @LievenKeersmaekers - I will keep on investigating - parameter sniffing is still a bit unpredictable to me! It would seem that many of my queries would benefit from always rebuilding the plan! – make_transition Mar 28 '13 at 16:30
  • @Bakwon - Sorry, been away. Just a note of warning, don't use the option on every query you use. You should Judge each statement in by itself if it would benefit from it. – Lieven Keersmaekers Mar 28 '13 at 18:02
  • @Bakwon - I have tried to summarize the comments in an answer. Feel free to adjust/correct it where appropriate. – Lieven Keersmaekers Mar 28 '13 at 18:13
  • @LievenKeersmaekers - Back from Easter holidays... :) I very much appreciate not using the recompile option on all my queries. I was defintely not going to do this - however i have suffered and solved parameters sniffing problems in the past - all within report stored procedures and all with the start/end date parameters. All of my queries that include this table have parameter sniffing issues. I was not aware that non stored procedures could also have this problem so something learnt. – make_transition Apr 02 '13 at 15:00

1 Answers1

1

A summary of what lead to a solution:

Add a covering index on supporterid, callEnd.

The assumption here is that the optimizer can use this index (in contrast with callEnd, supporterid) to

  • first join tblSupporterMainDetailsand tblCallLogs
  • further use it in the where clause for selecting callEnd

Add the option OPTION(RECOMPILE)

all cudo's to TiborK and Hunchback for explaining the difference to the optimizer of using hard coded constants or variables.

Performance Impact - Constant value -vs- Variable

When you use the constant, the value is known to the optimizer so it can determine selectivity (and possible index usage) based on that. When you use a variable, the value is unknown to the optimizer (so it have to go by some hardwired value or possibly density info). So, technically, this isn't parameter sniffing, but whatever article you find on that subject should also explain the difference between a constant and a variable. Using OPTION(RECOMPILE) will actually turn the variabe to a parameter sniffing situation.

In essence, there is a big difference between a constant, a variable and a paramater (whcih can be sniffed).

Lieven Keersmaekers
  • 57,207
  • 13
  • 112
  • 146
  • Adding the covering index was a good thing - however the query hint 'OPTION(RECOMPILE)' was the key to stopping the query use the 'bad' execution plan. I am still confused why a bad plan was used, despite clearing the plan cache. If parameter sniffing did indeed sniff my parameters then why not create a 'good' plan? I dont understand why forcing the query to recompile everytime means that it can pick a better plan? Its not as if i changed the parameter values between tests, so surely sniffing should have created the same plan as a forced recompile? – make_transition Apr 02 '13 at 15:16
  • In addition, I am a little confused by query 1 in the original question. This uses parameters, but runs very quickly. Therefore in this case parameter sniffing has not caused a problem. But the values of the parameters are the same - and these parameters are being used to filter the same table. It would appear that identical values for use in identical where clauses can be sniffed by different queries, and different plans created. The query optimiser should look at a few plans and use the fastest - but its let me down - And all i did was join an extra table! – make_transition Apr 02 '13 at 15:38
  • @Bakwon - The optimizer tries to generate a plan based on the best average values it has statistics off. You clearing the plan cache likely only resulted in the optimizer generating the same *(bad)* general plan as it did before and neglect the parameters. By using option(recompile), you effectively tell the optimizer not to use the general plan, but come up with a plan for this specific statement where it can use parameter sniffing (you've passed them in the same batch). Note that this is *my* understanding of what is happening. – Lieven Keersmaekers Apr 02 '13 at 16:51