WHERE IN performs much better with Table Type than with hardcoded values in SQL Server

Question

I have 2 queries that are essentially the same (at least if I didn't miss something).

DECLARE @siloIds SiloIdsTableType
INSERT INTO @siloIds VALUES 
(1),(2),(3)

-- Query 1
SELECT *
FROM [Transaction]
WHERE 
      SiloId IN (1,2,3)
  AND Time > '2000-02-01'

-- Query 2
SELECT *
FROM [Transaction]
WHERE 
       SiloId IN (select SiloId from @siloIds)
   AND Time > '2000-02-01'

I was thinking that one cannot beat constants declared in the query itself, but apparently the first query is few times slower than the second one. It seems that SQL server is not smart enough to provide a good plan for the hardcoded values, or am I missing something here?

It seems that one shall not use where in with a long list and TVP should be always favored

P.S. I use thousand values instead of 1,2,3 in my query

P.P.S. I have a non-clustered index on SiloId ASC, Time ASC, but it seems that the first query is not using it favoring clustered index scan for some reason.

P.P.P.S. Execution plan shares the cost 14% to 86% in favor of the second query

Execution plan:

Post the execution plan and IO statistics. For all one knows, you may be seeing the effects of caching the data the first time — Panagiotis Kanavos, Jan 13 '21 at 17:34
In fact, the exact opposite happens. There are no statistics for table variables, unlike hard-coded values. The optimizer assumes @siloIds` contains only a single row. There are differences from one SQL Server version to the next, but generally the optimizer has no way of knowing what's in a table variable — Panagiotis Kanavos, Jan 13 '21 at 17:36
`5 times in a row` which means the first query pays the price and all others just read from RAM. No matter how many times the queries are run, the first one will always appear a lot slower. Assuming `Time` is covered by an index, the main delay comes from IO — Panagiotis Kanavos, Jan 13 '21 at 17:37
how happens it uses the right index then while assuming there is 1 row, while not doing it with 1000 hardcoded values? — Ilya Chernomordik, Jan 13 '21 at 17:38
Why do you assume this happens at all? It's far, far more likely that the first execution is 100 times slower than the next ones that just use cached data, making the first option appear slower. *Post the execution plans and the IO statistics* — Panagiotis Kanavos, Jan 13 '21 at 17:39
If you want to benchmark queries, you have to make sure you clear *all* caches before *each* execution — Panagiotis Kanavos, Jan 13 '21 at 17:39
I did not figure out how to clear cache properly on Azure SQL. I did run it like that: 1st query, 2nd query, 1st query, 2nd query, etc. few times in a row. Results are consistent and hardcoded values are always slower and running clustered index scan. Also changing the order did not affect the results — Ilya Chernomordik, Jan 13 '21 at 17:41
Which is why you need to post some actual data - the execution plans and IO statistics. The fact you use Azure SQL matters as well. Azure SQL uses shared machines and an idle machine is allocated fewer resources compared to one that's active for some time. Right now there's no information at all — Panagiotis Kanavos, Jan 13 '21 at 17:42
I have posted the plans, what is the IO statistics you are referring to and how can I get it? — Ilya Chernomordik, Jan 13 '21 at 17:43
Now the difference is obvious, and the question's code doesn't match the execution plan. There are 1K rows, not just 5, and the table parameter has an *index*. Which is far better than statistics on an unindexed column — Panagiotis Kanavos, Jan 13 '21 at 17:45
I have written that I use 1000 values, but I could not just paste them not to make question to long. I have tried with and without index, it did not change the result (it was always slower) that's why I omitted it from the question — Ilya Chernomordik, Jan 13 '21 at 17:48
Isn't this an inefficient way to do an inner join? I would pass a TVP and do the join there. — Tanveer Badar, Jan 13 '21 at 17:52
@TanveerBadar I tried that, plans were exactly the same (you mean join compared to select ? ), execution time was the same as well — Ilya Chernomordik, Jan 13 '21 at 18:03
The table parameter has an *index* and a lot of values. That's what makes the difference - the index. This allows the server to optimize the join and produce different execution plans — Panagiotis Kanavos, Jan 13 '21 at 18:15

score 1 · Accepted Answer · answered Jan 13 '21 at 18:00

1

When you use a table variable (or a TVP, which is the same thing), SQL Server uses a fixed estimate that it will only get 1 row out of it (more on this below), a cardinality of 1. This means it assumes that the SiloId join filter is very selective, it will prioritize it and do a nested loop join to get just those rows, filtering afterwards on Time.

Whereas when you use constants, the exact size is hard-coded. For whatever reason (probably bad statistics), it has assumed that Time is more selective, and therefore prioritizes that over the other filter.

Where the table variable plan falls down is when the there are a lot of rows in it, or in the main table, because then you will get lots of key lookups, which can be slow.

Ideally you want the compiler to know the size of the table variable up front. You can do this in a number of ways, as Brent Ozar explains:

trace flag 2453, this causes a recompile if the cardinality is very different (good idea if you can risk a TF)
OPTION (RECOMPILE) (this recompiles every time, which may be inefficient in itself)
a temporary table (not possible as a parameter)

answered Jan 13 '21 at 18:00

Charlieface

52,284
6
19
43

strange that when it assumes there is just one row it uses the proper index, but with 1000 rows it does not :) – Ilya Chernomordik Jan 13 '21 at 18:07
As I explain, it's because of the cardinality calculation. 1000 key lookups is a fair amount for a small table. Have you tried `update statistics [Transaction] with fullscan` – Charlieface Jan 13 '21 at 18:08
I have tried on another data set, then the hardcoded where in provided the proper plan with an index seek, then the execution time was about the same. I have 1 mln rows in first data set and 200 mln in the second, so they are quite big – Ilya Chernomordik Jan 13 '21 at 18:10
@Charlieface the table parameter has an index that only appears in the execution plan. Even assuming there's only one row in the table parameter, using an index can result in a far better execution plan – Panagiotis Kanavos Jan 13 '21 at 18:16
@PanagiotisKanavos Yeah I know, in this case though it's just a happy coincidence. The other plan should have been nested loops also. I usually create indexes on my table types because I know/ensure the data will be unique. Wouldn't I wish for proper statistics, and for temp table parameters! – Charlieface Jan 13 '21 at 22:44

WHERE IN performs much better with Table Type than with hardcoded values in SQL Server

1 Answers1