I've been looking at trying to optimise (or at least change) some EF code in C# to use stored procedures, and found what seems to be an anomaly (or something new to me) when finding rows matching a constant list. The typical manually-generated short query would be something like...
SELECT Something FROM Table WHERE ID IN (one, two, others);
We had an EF query that we were replacing with a stored procedure call, so I looked at the output, saw that it was complex and thought my simpler query (similar to the above) would be better. It wasn't. Here is a quick demo that reproduces this.
Can anyone explain why the execution plans for the final version - with the
...WHERE EXISTS(... (SELECT 1 AS X) AS Alias UNION ALL...) AS Alias...)
construct is better - seemingly because it omits the costly SORT operation, even though the plan includes TWO index scans rather than the one of the simpler query.
Here's a self-contained example script (I hope)...
USE SandBox; -- a dummy database, not a live one!
-- create our dummy table, dropping first if it exists
IF EXISTS (SELECT NULL FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Test')
DROP TABLE Test;
CREATE TABLE Test (Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY, FormId INT NOT NULL, DateRead DATE NULL);
-- populate with some data
INSERT INTO Test VALUES (1, NULL), (1, GETDATE()), (1, NULL), (4, NULL), (5, NULL), (6, GETDATE());
-- Simple query that I might typically use
-- how many un-read entries are there for a set of 'forms' of interest, 1, 5 and 6
-- (we're happy to omit forms with none)
SELECT T.FormId, COUNT(*) AS TheCount
FROM Test AS T
WHERE T.FormId IN (1, 5, 6)
AND T.DateRead IS NULL
GROUP BY T.FormId;
-- This is the first step towards the EF-generated code
-- using an EXISTS gives basically the same plan but with constants
SELECT T.FormId, COUNT(*) AS TheCount
FROM Test T
WHERE EXISTS ( SELECT NULL
FROM (VALUES (1), (5), (6)
) AS X(FormId)
WHERE X.FormId = T.FormId
)
AND T.DateRead IS NULL
GROUP BY T.FormId;
-- A step closer, using UNION ALL instead of VALUES to generate the 'table'
-- still the same plan
SELECT T.FormId, COUNT(*) AS TheCount
FROM Test T
WHERE EXISTS ( SELECT NULL
FROM ( SELECT 1
UNION ALL
SELECT 5
UNION ALL
SELECT 6
) AS X(FormId)
WHERE X.FormId = T.FormId
)
AND T.DateRead IS NULL
GROUP BY T.FormId;
-- Now what the EF actually generated (cleaned up a bit)
-- Adding in the "FROM (SELECT 1 as X) AS alias" changes the execution plan considerably and apparently costs less to run
SELECT T.FormId, COUNT(*) AS TheCount
FROM Test T
WHERE EXISTS ( SELECT NULL
FROM ( SELECT 1 FROM (SELECT 1 AS X) AS X1
UNION ALL
SELECT 5 FROM (SELECT 1 AS X) AS X2
UNION ALL
SELECT 6 FROM (SELECT 1 AS X) AS X3
) AS X(FormId)
WHERE X.FormId = T.FormId
)
AND T.DateRead IS NULL
GROUP BY T.FormId;
Can anyone help me to understand why and if there is a benefit in wider use for this kind of query format?
I looked around for anything special in (SELECT 1 AS X)
stuff and though many show it as being common in EF output, I couldn't see anything about this particular apparent benefit.
Thanks in advance,
Keith