How to improve performance on a clustered index seek

Question

I'm trying to improve the performance on a query that is running very slowly. After going through the Actual Execution Plan; I found that a Clustered Index Seek was taking up 82%. Is there any way for me to improve the performance on an Index Seek?

Index:

/****** Object:  Index [IX_Stu]    Script Date: 12/28/2009 11:11:43 ******/
CREATE CLUSTERED INDEX [IX_Stu] ON [dbo].[stu] 
(
 [StuKey] ASC
)WITH (PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Table (some columns omitted for brevity):

CREATE TABLE [dbo].[stu](
 [StuCertKey] [int] IDENTITY(1,1) NOT NULL,
 [StuKey] [int] NULL
 CONSTRAINT [PK_Stu] PRIMARY KEY NONCLUSTERED 
(
 [StuCertKey] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

Is it a bad idea for me to have the Clustered Index on something other than the primary key? The query never uses the Primary key so I figured it would be better to create the clustered index on the column that is joined on the most (StuKey) — Abe Miessler, Dec 28 '09 at 19:21
Could you post the query. Also, many rows are in the table and approx how many are being returned by the query? — dan, Dec 28 '09 at 19:22
The clustered index does not need to be on the primary key; however, that's often a sign that the PK itself is redundant. If you have a secondary index on the PK that's never used, you're hurting overall performance. — Aaronaught, Dec 28 '09 at 19:25
The table has approximately 8 millions rows. There are about 6 million distinct StuKey values in that table. The query returns about 50 rows and is much more complicated than the piece i've presented here. — Abe Miessler, Dec 28 '09 at 19:27
If it's not unique, you normally shouldn't put the clustered index on it. Use a regular index and INCLUDE whatever columns you need it to cover. — Aaronaught, Dec 28 '09 at 19:33
Hmm, how would I determine what columns I need to cover? Should I just include anything from the Stu table that is in the SELECT of the query? Can you use the "output list" from the execution plan to determine what columns should be covered? — Abe Miessler, Dec 28 '09 at 19:46
I haven't used SQL*Server/Sybase for many years; as I recall, the clustered index used the actual data pages as leaf nodes. This implies that a "wide" table (lots of columns, or large columns) will be inefficient, particularly if there are duplicates: you may have to traverse multiple data blocks to find the desired rows. With a non-clustered index, the leaf nodes will contain significantly more entries per page. — kdgregory, Dec 28 '09 at 19:47
However, looking at your comment re 11 tables in a join, I suspect that the real issue is that the optimizer is picking a poor join sequence. Depending on the size of intermediate results, this may suggest the use of temporary tables, materialized views (does SQL-Server have those?) or other techniques to pre-compute portions of the query. — kdgregory, Dec 28 '09 at 19:50
@Abe: Your index needs to cover any columns that you subsequently SELECT or JOIN on when you perform lookups on that index. In some cases, that might be every other column, but keep in mind that this makes the index much larger and again, degrades overall performance. If your index doesn't cover the output columns then you end up with a relatively expensive Bookmark or RID lookup. — Aaronaught, Dec 28 '09 at 20:00
@kdgregory, SQL Server has had materialized views for as long as I can remember (and hasn't been called Sybase in over 10 years :P). The fact that there are 11 joins is almost certainly part of the problem, although it might be possible to solve using join hints. Otherwise, it's either denormalize or redesign. — Aaronaught, Dec 28 '09 at 20:02
Without seeing a complete schema and query, I wouldn't know whether or not they apply (it's a long shot, honestly). But you can force a particular type of join by specifying MERGE JOIN or HASH JOIN or LOOP JOIN. It's useful if your query knows more about the number or order of rows than the query optimizer does. Normally you should never need to do this if your statistics are up to date (`EXEC sp_updatestats`), but I've seen a few instances where it can improve join performance. — Aaronaught, Dec 28 '09 at 20:37
May be too obvious but are you sure the ON condition is not forcing sql to do any implicit conversion? — Max Favilli, Jan 23 '16 at 02:18

score 29 · Accepted Answer · answered Dec 28 '09 at 19:22

29

I'm generalizing here, but...

A clustered index seek is, for the most part, the best-case scenario. The only ways I can think of to improve performance would be:

Update the query to return fewer rows/columns, if possible;
Defragment or rebuild the index;
Partition the index across multiple disks/servers.

If it's only returning 138 rows, and it's that slow... maybe it's being blocked by some other process? Are you testing this in isolation, or are other users/processes online at the same time? Or maybe it's even a hardware problem, like a disk failure.

answered Dec 28 '09 at 19:22

Aaronaught

120,909
25
266
342

There could be a couple of other users on but not many. Also this is a DW query. If there are no updates going on in the db then there shouldn't be any locks that would prevent me from reading, right? – Abe Miessler Dec 28 '09 at 19:39
Some other process might be doing a `SELECT * FROM stu` with no filter, retrieving all 8 million rows; that would definitely slow other queries down due to the I/O bottleneck. Unlikely, but possible. – Aaronaught Dec 28 '09 at 19:56

score 17 · Answer 2 · answered Jan 30 '13 at 06:59

Clustered Index seeks occur when non-clustered indexes are used and aren't necessarily bad.

Consider the following query:

SELECT s.StuKey, s.Name, s.Address, s.City, s.State FROM stu s WHERE State='TX'

If there is only a clustered index on StuKey, then Sql Server only has 1 option, it must scan the entire table looking for rows where State="TX' and return those rows.

If you add a non-clustered index on State

CREATE INDEX IX_Stu_State on Stu (State)

Now Sql server has a new option. It can choose to seek using the non-clustered index, which will produce the rows where State='TX'. However, in order to get the remaining columns to return in the SELECT, it has to look up those columns by doing a clustered index seek for each row.

If you want to reduce the clustered index seeks, then you can make your index "covering" by including extra columns in it.

 CREATE INDEX IX_Stu_State2 on Stu (State) INCLUDE (name, address, city )

This index now contains all the columns needed to answer the query above. The query will do an index seek to return only the rows where State='TX', and the additional columns can be pulled out of the non-clustered index, so the clustered index seeks go away.

score 9 · Answer 3 · edited Dec 28 '09 at 20:21

A clustered index range seek that returns 138 rows is not your problem.

Technically you can improve the seek performance by making the clustered index narrower:

evict all varlenght into a separate allocation unit by setting 'large value types out of row' to 1 and recreating the table from scratch).
enable page compression (SQL 2008 EE only).

Both can have quite a dramatic impact on range seek time, as they reduce the IO and the need to hit physical reads. Of course, as usually, the result will vary on a big number of other factors, like what columns do you project (evicting a projected column into BLOB allocation unit may actually have adverse effects on certain queries). As a side note, usually fragmentation will have only a marginal impact on such a short range scan. Again, it depends.

But as I say, I highly doubt this is your true problem. You have only posted selected parts of the plan and the results of your own analysis. The true root cause may lay completely elsewhere.

gbn · Answer 4 · 2009-12-28T19:40:46.260

3

Thoughts...

Why is IX_Stu clustered? Internally, SQL Server adds a 4 byte "uniqueifier" to non-unique clustered indexes. What is the justification? This also bloats your PK too
What is the actual query you are running?
Finally, why FILLFACTOR 80%?

Edit:

A "normal" FILLFACTOR would be 90%, but this is a rule of thumb only
An 11 join query? That's most likely your problem. What are your JOINs, WHERE clauses etc? What is the full text plan?

edited Dec 28 '09 at 19:40

answered Dec 28 '09 at 19:28

gbn

422,506
82
585
676

2

In regards to your first question the I made IX_Stu clustered because it is going to be used for the most joins. I thought that would improve performance, am I mixed up here? Second question, I'd rather not post the query up on the internet if I can avoid it. It's very large with eleven joins if that helps you at all... Last question: I created the index through management studio and not by a query. Apparently management studio added the FILLFACTOR 80%. Could that potentially cause problems? – Abe Miessler Dec 28 '09 at 19:37

score 3 · Answer 5 · answered Dec 28 '09 at 20:32

Some general advice: when I have to do query optimization, I start by writing out what I think the execution plan should be.

Once I've decided what I think the execution plan should be, I try to make the actual query fit this plan. The techniques to do this are different for each DBMS, and do not necessarily transfer from one to the other, or even, sometimes, between different versions of the DBMS.

The thing to keep in mind is that the DBMS can only execute one join at a time: it starts with two initial tables, joins those, and then takes the result of that operation and joins it to the next table. The goal at each step is to minimize the number of rows in the intermediate result set (more correctly, to minimize the number of blocks that have to be read to produce the intermediate results, but this generally means fewest rows).

Interesting. Say that I know that a certain join reduces my set from 6 million rows down to about 500,000. Does it improve performance to have that Join come before the other joins or maybe create a CTE that holds the contents of that Join and then use that going forward? — Abe Miessler, Dec 28 '09 at 20:42

score 2 · Answer 6 · edited Aug 27 '13 at 14:04

What happens if you hard-code your WHERE criteria, like this:

SELECT StuCertKey, StuKey FROM stu 
WHERE stuKey in (/* list 50 values of StuKey here */)

If it's still very slow, you have an internal problem of some kind. If it's faster, then the index isn't your bottleneck, it's the JOINs that you're doing to create the WHERE filter.

Note that SELECT * can be very slow if there are many large columns, and especially if there are BLOBs.

score 1 · Answer 7 · answered Dec 28 '09 at 19:20

1

Have you tried some maintenance on this index? Like defrag it? Seems really strange that it costs THAT much (120.381). Index seek is the fastest index operation, shouldn't take that long. Can you post the query?

answered Dec 28 '09 at 19:20

Pedro

11,514
5
27
40

Any suggestions on how much de-fragmentation is too much de-fragmentation? – Abe Miessler Dec 28 '09 at 19:31

score 1 · Answer 8 · answered May 04 '17 at 07:31

Check the index statictics.

reCalculating the clustered-index statistics will solve the problem.

in my case, i was looking for 30 records in 40M recored. the execution plan says it's going through the clustered-index but it took about 200ms. and the index wasn't defragmented. after recalculating it's stats, it's getting done under 10ms!

score 0 · Answer 9 · edited May 23 '15 at 03:26

0

Rebuild the index, and calculate stats?

The only other way that I can think to speed it up is to partition the table, which may or may not be possible.

edited May 23 '15 at 03:26

shA.t

16,580
5
54
111

answered Dec 28 '09 at 19:22

Paul Creasey

28,321
10
54
90

Tried rebuilding indexes and stats. no luck though. I might be looking into partitioning the tables next. Is that a good solution for slow seeks? – Abe Miessler Dec 31 '09 at 00:28

How to improve performance on a clustered index seek

9 Answers9