Why does wrong constant defeat parameter sniffing?

Question

I have a table Credit:

create table Credit (ID_Credit int identity, ID_PayRequestStatus int, ... 20 more fields)
create nonclustered index Credit_ix_PayRequestStatus ON dbo.Credit(ID_PayRequestStatus)

The table has about 200k rows. The distribution of ID_PayRequestStatus is as follows:

ID_PayRequestStatus	Number of Values
400	198000
300	1000
200	490
100	450
999	250

If I run a query like this:

declare @ID_Status int = 200
select * from Credit where ID_PayRequestStatus = @ID_Status

It uses an index scan of the primary key clustered index and SSMS suggests that I create another index on the same column (ID_PayRequestStatus) but INCLUDE the entire table. Sounds like classic parameter sniffing.

If I run the following query to defeat parameter sniffing (note that the parameter is not what I am optimizing for):

declare @ID_Status int = 200
select * from Credit 
where ID_PayRequestStatus = @ID_Status
OPTION (OPTIMIZE FOR (@ID_Status=100))

It uses the index seek on Credit_ix_PayRequestStatus index just as I intended.

My main question is why does comparing against a value, that I didn't optimize for, defeat parameter sniffing?

My secondary question is why does SQL Server use parameter sniffing (e.g. index scan) to begin with? It's a single table, the index is clearly defined, there are no joins and there aren't any secondary criteria in the WHERE statement. I suspect, it's because of the lopsided distribution of ID_PayRequestStatus with 400 taking up 99% of the rows and, thus SQL Server deciding that it's cheaper to do a scan. However, the queries against this table with ID_PayRequestStatus=400 constitute less than 1% of the total. Shouldn't the auto statistics have kicked in by now to resolve this performance problem?

*"It uses an index scan of the primary key clustered index and SSMS suggests that I create another index on the same column (ID_PayRequestStatus) but INCLUDE the entire table. Sounds like classic parameter sniffing."* This isn't parameter sniffing at all. The index `Credit_ix_PayRequestStatus` isn't a covering index, and thus SQL Server has determined that a scan of the clustered index is quicker than a seek on the non-clustered index followed by a key lookup to get the values of the other 20 columns. The index it suggests is the *right* index for the query you've provided. — Thom A, Oct 04 '22 at 16:01

score 1 · Accepted Answer · answered Oct 04 '22 at 16:04

When you filter by a local variable, parameter sniffing is suppressed as the value is not available at the time the query plan is generated

So this

declare @ID_Status int = 200
select * from Credit where ID_PayRequestStatus = @ID_Status

is similar to

declare @ID_Status int = 200
select * from Credit 
where ID_PayRequestStatus = @ID_Status
OPTION (OPTIMIZE FOR UNKNOWN)

My main question is why does comparing against a value, that I didn't optimize for, defeat parameter sniffing?

OPTION (OPTIMIZE FOR (@ID_Status=100))

This optimizes for @ID_Status=100, just as if parameter sniffing had sniffed that value.

My secondary question is why does SQL Server use parameter sniffing (e.g. index scan) to begin with?

Again the scan is not the result of parameter sniffing. It's optimizing for unknown, and considering the the most common case to be the value 400, for which the clustered index scan is cheaper than a non-clustered index scan + bookmark lookup for the other columns.

Shouldn't the auto statistics have kicked in by now to resolve this performance problem?

Auto stats will ensure SQL Server knows that 400 is the most common column. The fact that you rarely query for that value is not known to SQL Server.

Thanks, that clears things up. However, I am trying to understand why optimizing for a value that I am not searching for generates the fastest results and best plan. I am using `OPTION (OPTIMIZE FOR (@ID_Status=100))`, but searching for ID_PayRequestStatus = 200. — AngryHacker, Oct 04 '22 at 16:47

score 1 · Answer 2 · answered Oct 04 '22 at 16:07

The first query with the local variable does not use parameter sniffing. A variable is not a parameter so the average density stats are used to optimize the query instead of the actual variable value and stats histogram. This results in a scan in this case.

The query with the OPTIMIZE FOR hint uses the specified value along with the stats histogram to optimize the query. This results in a seek in this case.

An OPTION(RECOMPILE) hint will sniff the variable value to optimize the query but the cached plan will not be reused for subsequent executions.

Why does wrong constant defeat parameter sniffing?

2 Answers2