TLDR: This is a bug in the product (reported here).
The poor practice that exposes this bug is mismatched datatypes (varchar
column being compared to nvarchar
) - on SQL collations this would just cause an implicit cast of the column to nvarchar
and a full scan.
On Windows collations this can still result in a seek. This is generally a useful performance optimisation but here you have hit an edge case...
More Detail: use the below setup...
CREATE TABLE dbo.TEMP
(
ID INT IDENTITY PRIMARY KEY,
[TESTCOLUMN] [varchar](50) COLLATE Latin1_General_CI_AS NULL INDEX [I_TEMP_TESTCOLUMN],
Filler AS CAST('X' AS CHAR(8000)) PERSISTED
)
--Add 7 rows where TESTCOLUMN is NOT NULL
INSERT dbo.TEMP([TESTCOLUMN]) VALUES ('aardvark'), ('badger'),
('badges'), ('cat'),
('dog'), ('elephant'),
('zebra');
--Add 49 rows where TESTCOLUMN is NULL
INSERT dbo.TEMP([TESTCOLUMN])
SELECT NULL
FROM dbo.TEMP T1 CROSS JOIN dbo.TEMP T2
Then first look at the actual execution plan for
SELECT COUNT(*)
FROM dbo.TEMP
WHERE TESTCOLUMN = N'badger'
OPTION (RECOMPILE)

In SQL Collations the implicit cast to nvarchar
would make the predicate entirely unsargable. With windows collations SQL Server is able to add the apparatus to the plan where the compute scalar calls an internal function GetRangeThroughConvert(N'badger',N'badger',(62))
and the resultant values end up being fed into a nested loops join to give start and end points for an index seek. (the article "Dynamic Seeks and Hidden Implicit Conversions" has some more details about this plan shape)
It is not exposed in the execution plan what the range start and end values are that this internal function returns but it is possible to see them if you happen to have a SQL Server build available where the short lived query_trace_column_values
extended event has not been disabled. In the case above the function returns (badger, badgeS, 62)
and these values are used in the index seek. As I added a row with the value "badges" in this case the seek ends up reading one more row than strictly necessary and the residual predicate retains only the one for "badger".
Now try
SELECT COUNT(*)
FROM dbo.TEMP
WHERE TESTCOLUMN = N''
OPTION (RECOMPILE)
The GetRangeThroughConvert
function appears to give up when asked to provide a range for an empty string and output (null, null, 0)
.
The null
here indicate that that end of the range is unbounded so effectively the index seek just ends up reading the whole index from first row to last.

the above shows the index seek read all 56 rows but the residual predicate did the job of removing all those not matching TESTCOLUMN = N''
(so the operator returns zero rows).
In general the seek predicate used here seems to act like a prefix search (e.g. the seek [TESTCOLUMN] = N'A'
will read at least all rows starting with A
with the residual predicate doing the equality check) so my expectations for empty string here would not be high in the first place but Paul White indicates that the range being seeked here is likely a bug anyway.
When you add the OR
predicate to the query the execution plan changes.
It now ends up getting two outer rows to the nested loops join and so ends up doing two seeks (two executions of the seek operator on the inside of the nested loops).
One for the TESTCOLUMN = N''
case and one for the TESTCOLUMN IS NULL
case. The values used for the TESTCOLUMN = N''
branch are still calculated through the GetRangeThroughConvert
call (as this is the only way SQL Server can do a seek for this mismatched datatype case) so still have the expanded range including NULL
.
The problem is that the residual predicate on the index seek now also changes.
It is now
CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[TEMP].[TESTCOLUMN],0)=N''
OR [tempdb].[dbo].[TEMP].[TESTCOLUMN] IS NULL
The previous residual predicate of
CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[TEMP].[TESTCOLUMN],0)=N''
would not be suitable as this would incorrectly remove the rows with NULL
that need to be retained for the OR TESTCOLUMN IS NULL
branch.
This means that when the seek for the N''
branch is done it still ends up reading all the rows with NULL
as before but the residual predicate no longer is fit for purpose at removing these.
It might also seem a bit of a miss that the merge interval in the problem plan does not merge the overlapping ranges for the index seeks.
I assume this does not happen due to the different flags values from the two branches. Expr1014
has a value of 60
for the IS NULL
branch and 0
for the = N''
branch.