9

I have a table with a VARCHAR column and an index on it. Whenever a SELECT COUNT(*) is done on this table that has a check for COLUMN = N'' OR COLUMN IS NULL it returns double the number of rows. SELECT * with the same where clause will return the correct number of records.

After reading this article: https://sqlquantumleap.com/2017/07/10/impact-on-indexes-when-mixing-varchar-and-nvarchar-types/ and doing some testing I believe the collation of the column and the implicit conversion isn't the fault (at least not directly). The collation of the column is Latin1_General_CI_AS.

The database is on SQL Server 2012, and I've tested on 2016 as well.

I've created a test script (below) that will demonstrate this problem. In doing so, I believe that it may be related to data paging, as it needed a bit of data in the table for it to occur.

CREATE TABLE [dbo].TEMP 
(
    ID [varchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
    [DATA] [varchar](200) COLLATE Latin1_General_CI_AS NULL,
    [TESTCOLUMN] [varchar](50) COLLATE Latin1_General_CI_AS NULL,
    CONSTRAINT [PK_TEMP] PRIMARY KEY CLUSTERED ([ID] ASC)
)
GO

CREATE NONCLUSTERED INDEX [I_TEMP_TESTCOLUMN] ON dbo.TEMP (TESTCOLUMN ASC)
GO

DECLARE @ROWS AS INT = 40; 

WITH NUMBERS (NUM) AS 
(
    SELECT 1 AS NUM
    UNION ALL
    SELECT NUM + 1 FROM NUMBERS WHERE NUM < @ROWS
)
INSERT INTO TEMP (ID, DATA)
SELECT NUM, '1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901324561234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890' 
FROM NUMBERS

SELECT @ROWS AS EXPECTED, COUNT(*) AS ACTUALROWS
FROM TEMP
GO

SELECT COUNT(*) AS INVALIDINDEXSEARCHCOUNT
FROM TEMP
WHERE (TESTCOLUMN = N'' OR TESTCOLUMN IS NULL)
GO

DROP TABLE TEMP

I'm able to modify the database to some extent (I won't be able to change data, or change the column from allowing NULL), unfortunately I am not able to modify the code doing the search, can anyone identify a way to get the correct COUNT(*) results returned?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Lagfrak
  • 91
  • 2
  • Here is a db<>fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f144694f9c136c9b46dc3661f700a6d7. – Gordon Linoff Mar 30 '20 at 01:21
  • Only happens with index `I_TEMP_TESTCOLUMN` also. – Dale K Mar 30 '20 at 01:24
  • 2
    Reproducible on Win 7, SQL Server 2014 SP3. This is one nasty bug, especially since MS is [recommending Windows collations](https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#SQL-collations). Please post MS bug ID so that we can assist / upvote it. – Alex Mar 30 '20 at 04:32
  • Putting your code into a fiddle, the result is 80 as long as your collation is in the script. As soon as you remove the collation from the Testcolumnm the result is the expected 40. Anyways, changing your script to `WHERE (ISNULL(TESTCOLUMN, N'') = N'')` and removing the `OR` will solve the issue as well. – Tyron78 Mar 30 '20 at 06:47
  • Using `WHERE NULLIF(TESTCOLUMN, N'') IS NULL` is OK. – gotqn Mar 30 '20 at 06:55
  • `(TESTCOLUMN = '' OR TESTCOLUMN IS NULL)` (without the `N`) returns the correct result. Since you can't alter the query but can alter the table, how about: `DROP INDEX [dbo].[TEMP].[I_TEMP_TESTCOLUMN]; ALTER TABLE [dbo].[TEMP] ALTER COLUMN [TESTCOLUMN] [nvarchar](50) COLLATE Latin1_General_CI_AS NULL; CREATE NONCLUSTERED INDEX [I_TEMP_TESTCOLUMN] ON dbo.TEMP (TESTCOLUMN ASC);` ? – AlwaysLearning Mar 30 '20 at 08:47
  • 2
    Thanks everyone for the feedback. I've raised a ticket with [Microsoft](https://feedback.azure.com/forums/908035-sql-server/suggestions/40066435-count-on-varchar-index-with-blank-nvarchar-or-n) – Lagfrak Mar 31 '20 at 23:14
  • @AlwaysLearning, I've considered changing the column to ```NVARCHAR``` however none of the data expected is Unicode, and while it's primarily happening on one of my main columns, it could happen to any column in the database that fits this criteria, so I'd just be waiting for the problem to recur somewhere else. – Lagfrak Mar 31 '20 at 23:22
  • The Microsoft link [given above](https://stackoverflow.com/questions/60922793/count-on-varchar-index-with-blank-nvarchar-or-null-check-results-in-double-th#comment107852594_60922793) has changed to https://feedback.azure.com/d365community/idea/8020f59a-4a25-ec11-b6e6-000d3a4f0da0. – GSerg Dec 04 '22 at 21:41
  • 1
    @GSerg - I've created a different one as that one was lacking repro details (or even the details necessary to produce the repro such as collation) https://feedback.azure.com/d365community/idea/d12ba420-6d74-ed11-a81b-000d3ae49307 - the new one also pinpoints the problem better – Martin Smith Dec 05 '22 at 07:56
  • @Alex - fairly nasty but the best practice is not to compare `varchar` columns with `nvarchar` values anyway. Even without the bug it can cause sub optimal results – Martin Smith Dec 05 '22 at 20:22
  • @MartinSmith - I agree that it is not a good practice BUT we should not even be having this discussion 2 years later. This is a silent failure to produce correct results in a `SELECT` statement of all things! I do often wonder what substances are ingested by people responsible for bug prioritization. – Alex Dec 06 '22 at 05:42

3 Answers3

3

TLDR: This is a bug in the product (reported here).

The poor practice that exposes this bug is mismatched datatypes (varchar column being compared to nvarchar) - on SQL collations this would just cause an implicit cast of the column to nvarchar and a full scan.

On Windows collations this can still result in a seek. This is generally a useful performance optimisation but here you have hit an edge case...


More Detail: use the below setup...

CREATE TABLE dbo.TEMP 
(
    ID INT IDENTITY PRIMARY KEY,
    [TESTCOLUMN] [varchar](50) COLLATE Latin1_General_CI_AS NULL INDEX [I_TEMP_TESTCOLUMN],
    Filler AS CAST('X' AS CHAR(8000)) PERSISTED
)

--Add 7 rows where TESTCOLUMN is NOT NULL
INSERT dbo.TEMP([TESTCOLUMN]) VALUES ('aardvark'), ('badger'), 
                                     ('badges'), ('cat'), 
                                     ('dog'), ('elephant'), 
                                     ('zebra');

--Add 49 rows where TESTCOLUMN is NULL 
INSERT dbo.TEMP([TESTCOLUMN]) 
SELECT NULL 
FROM dbo.TEMP T1 CROSS JOIN dbo.TEMP T2

Then first look at the actual execution plan for

SELECT COUNT(*) 
FROM dbo.TEMP
WHERE TESTCOLUMN = N'badger'
OPTION (RECOMPILE)

enter image description here

In SQL Collations the implicit cast to nvarchar would make the predicate entirely unsargable. With windows collations SQL Server is able to add the apparatus to the plan where the compute scalar calls an internal function GetRangeThroughConvert(N'badger',N'badger',(62)) and the resultant values end up being fed into a nested loops join to give start and end points for an index seek. (the article "Dynamic Seeks and Hidden Implicit Conversions" has some more details about this plan shape)

It is not exposed in the execution plan what the range start and end values are that this internal function returns but it is possible to see them if you happen to have a SQL Server build available where the short lived query_trace_column_values extended event has not been disabled. In the case above the function returns (badger, badgeS, 62) and these values are used in the index seek. As I added a row with the value "badges" in this case the seek ends up reading one more row than strictly necessary and the residual predicate retains only the one for "badger".

Now try

SELECT COUNT(*) 
FROM dbo.TEMP
WHERE TESTCOLUMN = N''
OPTION (RECOMPILE)

The GetRangeThroughConvert function appears to give up when asked to provide a range for an empty string and output (null, null, 0).

The null here indicate that that end of the range is unbounded so effectively the index seek just ends up reading the whole index from first row to last.

enter image description here

the above shows the index seek read all 56 rows but the residual predicate did the job of removing all those not matching TESTCOLUMN = N'' (so the operator returns zero rows).

In general the seek predicate used here seems to act like a prefix search (e.g. the seek [TESTCOLUMN] = N'A' will read at least all rows starting with A with the residual predicate doing the equality check) so my expectations for empty string here would not be high in the first place but Paul White indicates that the range being seeked here is likely a bug anyway.

When you add the OR predicate to the query the execution plan changes.

It now ends up getting two outer rows to the nested loops join and so ends up doing two seeks (two executions of the seek operator on the inside of the nested loops).

One for the TESTCOLUMN = N'' case and one for the TESTCOLUMN IS NULL case. The values used for the TESTCOLUMN = N'' branch are still calculated through the GetRangeThroughConvert call (as this is the only way SQL Server can do a seek for this mismatched datatype case) so still have the expanded range including NULL.

The problem is that the residual predicate on the index seek now also changes.

It is now

CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[TEMP].[TESTCOLUMN],0)=N'' 
OR [tempdb].[dbo].[TEMP].[TESTCOLUMN] IS NULL 

The previous residual predicate of

CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[TEMP].[TESTCOLUMN],0)=N''

would not be suitable as this would incorrectly remove the rows with NULL that need to be retained for the OR TESTCOLUMN IS NULL branch.

This means that when the seek for the N'' branch is done it still ends up reading all the rows with NULL as before but the residual predicate no longer is fit for purpose at removing these.

It might also seem a bit of a miss that the merge interval in the problem plan does not merge the overlapping ranges for the index seeks.

I assume this does not happen due to the different flags values from the two branches. Expr1014 has a value of 60 for the IS NULL branch and 0 for the = N'' branch.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
-2

In my test, which was on SQL 2019, when one removes the N and just compares against '' or null, the double counting goes away.

SELECT COUNT(*) AS ACTUALROWS
FROM TEMP 
WHERE (TESTCOLUMN = '' OR TESTCOLUMN IS NULL)

The N identifier indicating Unicode is inappropriate anyway as the search column is not of type NVARCHAR. If test column were of type NVARCHAR, the count would be correct.

Eric Kassan
  • 578
  • 4
  • 10
-2

Eric Kassan's answer is correct: The column in the table is VARCHAR, but you are searching as if the column is NVARCHAR.
These are two different datatypes, so the column should be changed to NVARCHAR, or the query should be changed by removing the N.
Why the result is doubled up when joining different datatypes is interesting, but that was not the question. :)

Johnny
  • 36
  • 2
  • 2
    `but that was not the question` - it very specifically was. The [Eric Kassan's answer](https://stackoverflow.com/a/74378560/11683) you are mentioning is pointless, too. This is a bug in SQL Server. – GSerg Dec 04 '22 at 21:42