Why Index-seek is the most costly operator and not the Filter in this plan?

Question

I have this plan that took ~ 18 seconds to run (generated by SentryOne plan explorer):

The Index-seek used a non-clustered index, with some seek predicates. While the Filter operator has some other predicates including the LIKE condition.

This is the query (kind of):

SELECT  TOP(20) *
FROM    Table
WHERE   Table.Col1 = ''
        AND Table.Col2 LIKE '%' + @searchString + '%'
ORDER BY Table.Col3

1st (unrelated) question:

How do you read an execution plan?

As I read it, "SQL" begins with the leftmost operator Top, then it'll call its child operator (node) on the right to get 1 single row and keep going until it meets its target (20 rows). If there is no Top, then it'll keep going until its child node saying there are no more left. And the child node will call the child node's child node. So the Filter has called the Index-seek 2.9 million times to fetch 2.9 million rows in the above plan. I hope I'm correct?

Then I made a small change to the query:

SELECT  TOP(20) *
FROM    Table
WHERE   Col1 = ''
        AND Col2 LIKE @searchString + '%'  -- search for string prefix only
ORDER BY Table.Col3

This time the query took < 2 seconds to run. The actual execution is exactly the same as the above. I compared the 2 inside SSMS. Index-seek still the most costly operator, but it returned the same number of rows (2.9 million).

2nd question:

The change I made with the LIKE condition must have reduced 16s of execution time. So why does Index-seek is marked as the most costly? Shouldn't it be Filter because it's the one that handle the LIKE search right? If I didn't make this experiment, how do I know the LIKE condition is the evil here?

3rd question:

How do I know how much time SQL takes to handle one task is just the right amount or not?

I'm asking this question because I'm new to optimizing queries. The first time I saw the plan, I don't know if an Index-seek on 3 million rows that cost 18s is just right or there's something wrong with it.

Th latter query doesn't have a leading wild card, meaning the query is SARGable; that's a big difference. — Thom A, Aug 27 '20 at 18:56
One question per question please. Q1 is too broad for SO you need to do some research. — Dale K, Aug 27 '20 at 20:16

score 1 · Answer 1 · answered Aug 28 '20 at 13:14

Reading execution plans

From right to left. SQL Server creates an execution plan, to start at the right hand side. (The side where the data is)

So in your example, the Index seek happens, streams N rows of data to the next operator on the left side, the filter then processes all the rows in memory (page buffer), and so on.

To see this in action you can turn on live query statisitcs in SSMS. This way you can see the data flowing through each operator. Great to see the query engine in action.

Cost

Cost is relative (Based on estimated subtree cost). Fetching data from a disk (SSD or HDD) is I/O. I/O is still an expensive operation. The index seek you see, is already taking care of the filter before it retrieves the data from disk. Otherwise you would have full table scan or index scan. Think of it as a phone book: if you lookup a person by Last name, you can skip most of the pages, because you know the first letter(s) of persons name, and the phone book is order by name in alphabetical order.

The filter in your example doesn't have a high cost, since all rows are in memory & the index seek probably filtered out N rows already.

The right amount of time

Time is relative. Dependent on

Hardware: A query on your SQL server can take 16 seconds, but running it on a machine with 128 GB of memory, while all data is already in the buffer pool it may take < 100 ms.
Buffer cache: All data, or a part of the data can already be cached in memory
Execution plan cache: Is it the first execution (Or OPTION(RECOMPILE)), then SQL Server needs to quickly guess the cheapest execution plan. The second execution of this query can reuse the plan, but it might not be a good execution plan for the newly passed parameters.
Parallelism: Queries can go parallel to decrease the overall execution time but increase CPU time. Ddepending on your cores and configuration for max degree of parallelism & cost threshold for parallelism
Index fragmentation: Indexes can be fragmented when lots of updates / deletes happen.
Statistics: Statistics are used to guess how many rows will be returned, this estimates can be wrong in many cases. (Statistics aren't up to date, data variety in the indexes, 201 bucket problem, ...)
Locking & Isolation level: While you try to read data, a different transaction might be changing the data. Locking the data for your query, increasing your execution time.

If you want to have an correct comparison number you can use SET STATISTICS IO ON to turn on Statistics IO. If you open the messages tab in SSMS after you ran a query you will see something like.

Table 'Customer'. Scan count 1, logical reads 4152, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Logical reads is the key indicator. It indicates how many AK1 pages where read to create your desired result.

To have an idea how many logical reads can happen on a table (On the clustered index). You can do a simple SELECT count(*) FROM table with statistics IO turned on and compare the logical reads against your queries.

When a query is nonSARGABLE (in your example LIKE '%' + @Param + '%') it can happen you have more logical reads then you have pages in your table.
If you look at the phone book example and you want to retrieve all persons who's lastname contains 'Do', you need to scan the whole book = nonSARGABLE. If you change it to starts with, you can do an index seek, because you know where the first occurence of Do is and the first occurrence of Dp is.

If data flows from right to left, how does SQL know when to stop reading data in a `SELECT TOP` query? Does SQL just guess that the 20 rows are located somewhere in the first 10k rows? So it'll read 10k rows instead of the whole table and pass that to the next step? — Hp93, Sep 11 '20 at 10:06
This depends on the order by. If you didn't specify an order by, but did specify a top it will scan the whole table/index. If you have an order by and an matching index, the engine calculates how many rows are returned from a seek. You can test this by running your query with and without the order by and comparing the execution plans. — Preben Huybrechts, Sep 11 '20 at 10:12

Why Index-seek is the most costly operator and not the Filter in this plan?

1 Answers1

Reading execution plans

Cost

The right amount of time