Why does the optimizer choose a keylookup instead of 2 separate queries?

Question

I have a table that has a primary key/clustered index on an ID column and a nonclustered index on a system date column. If I query all the columns from the table using the system date column (covering index wouldn't make sense here) the execution plan shows a key lookup because for each record it finds it has to go the the ID to get all of the column data.

The weird thing is, if I write 2 queries with a temp table it performs much faster. I can query the system date to get a table of ID's and then use that table to search the ID column. This makes sense because you're no longer doing the slow key lookup for each record.

Why doesn't the optimizer do this for us?

--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index  

select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01'

--faster version 
--id primary key/clustered index
--systemdate nonclustered index  

select ID, SystemDate
into #myTempTable
from MyTable
where SystemDate > '2019-01-01'

select t1.ID, t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.SystemDate
from MyTable t1
inner join #myTempTable t2
on t1.ID = t2.ID

score 1 · Answer 1 · answered Aug 24 '19 at 19:14

1

Well, in second case you're actually doing a key lookup yourself, aren't you? ; )

Optimizer could perform slower due to outdated (or missing) statistics, fragmented index.

To tell you why it's actually slower, it's best if you'd paste your execution plans here. This would be way easier to explain what happens.

answered Aug 24 '19 at 19:14

Evaldas Buinauskas

13,739
11
55
107

The query plan for the lookup looks like the one posted below in Lukasz's reply. I think this is the problem. A key lookup will do a nested loop join whereas if you separate the queries out like I did above the optimizer performs either a merge/hash join. I think that's the reason why the key lookup performs so much slower, but I could be wrong. – schulzey Aug 26 '19 at 01:09
Ye, but could you paste plan where you use temporary table? You could then see whether it does a nested loop or a merge/hash join? But what you're suggesting sounds about right for me. – Evaldas Buinauskas Aug 26 '19 at 05:52

Lukasz Szozda · Answer 2 · 2019-08-24T20:03:58.260

Query optimizer chooses key lookup because the query is not supported by covering index. It has to grab missing columns from table itself:

/*
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index 
*/
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01';

Adding a covering index should boost the performance:

CREATE INDEX my_idx ON MyTable(SystemDate) INCLUDE(col1, col2, col3, col4, col5);

db<>fiddle demo

For query without JOIN:

select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable  -- single table
where SystemDate > '2019-01-01';

There is JOIN in execution plan:

After introducing covering index there is no need for additional key lookup:

I understand this. However, if you work in a large enough organization where all kinds of queries are hitting the same tables, putting a covering index on an entire table each time someone grabs something more than just what's in the index isn't viable. What I'm wondering is why writing as two separate queries is faster than using the one query that performs a key lookup. — schulzey, Aug 26 '19 at 01:06

Why does the optimizer choose a keylookup instead of 2 separate queries?

2 Answers2