-1

I have a table which has 4 columns (region_id, product_id, cate_id, month_id) as a primary key. This primary key was created as default, so a clustered index were created for PK. This table contains more than 10 millions rows.

If I delete existing pk and create a new pk with non-clustered index type, is it better than clustered index for the following query?

select region_id, product_id, cate_id, month_id, a, b, c 
from fact_a
where month_id > 100

Thanks in advance.

Kay
  • 1
  • 1
  • If this is query a that runs a lot sure why not? Non clustered indexes serve to improve performance where the clustered index doesnt cover it. – JonH Mar 06 '15 at 19:22
  • You can't have more than one clustered index on a table. Can you elaborate what you actually mean here? – Lasse V. Karlsen Mar 06 '15 at 19:25
  • I've updated my question, sorry for the confusion. – Kay Mar 06 '15 at 20:11
  • What **datatypes** are those columns? Typically, a clustered index should be **as narrow as possible**, unique, and static. Having four columns is rarely ideal.... – marc_s Mar 06 '15 at 20:15
  • It's not a good approach to ask such one-off questions. You learn little here. Just read a (short) indexing tutorial, this question will be easily answered by that. – usr Mar 06 '15 at 21:26

2 Answers2

0

A simple nonclustered index on month_id will certainly improve the average performance for that query (assuming month_id for most of the rows is less than 100, so that the where clause excludes most of the rows). However, if you're creating the index specifically for that query (or any queries with month_id in the where clause and a, b, c, month_id or a subset of those in the select), you will get even better results by including the selected values in the index, like this:

CREATE INDEX index_fact_a_month_id ON fact_a (month_id) INCLUDE (a,b,c)
James
  • 3,551
  • 1
  • 28
  • 38
0

The quick answer, yes, removing the primary key (moreso, replacing the current multi-column Primary Key with a single identity column) and then creating your NCI on Month_ID will be better/faster/more efficient.

Clustered Index - it IS the data. It contains every column of every row in the table. There can only be one CI because the table data only needs to exist once. Each row has a key...

Primary Key - it is the key to identify a row in a Clustered Index.

Non-Clustered Index - it acts as a table of a subset of columns from the rows in the Clustered Index.

Keeping it simple, a Non-Clustered Index contains less data than the Clustered Index, and it orders the data in a way (Month_id ASC) that makes queries against it much more efficient than querying against the CI (A, B, C, Month_ID). SQL Server has no way to "dip" into the CI Primary Key or row data and say, "Hey, I'm filtering by Month_ID, so I'll just go right to that column." By nature of Clustered Indexes, SQL Server "reads" all CI rows (index scan), every column, every byte of data. Very inefficient and wasteful since your WHERE clause will be filtering out a lot of these rows.

The Non Clustered Index only contains a subset of columns, so it is much more efficient in that it can say, "Hey, I'm filtering by Month_ID, and I only contain Month_ID, aaannnd Month_ID is in ascending order, so I can just jump right to the rows that I want!" (index seek). Much more efficient since only the rows you want to return will be "read" by SQL Server.

Getting a little more advanced, since the Non Clustered Index is only Month_ID, but you are querying for all the columns in the Clustered Index, SQL Server needs to be able to go back to the CI from the NCI to get rest of the columns. To do that, the Primary Key of the CI is stored in the NCI, along with the column subset. So the NCI is really like a two column table of (Month_ID, CI Primary Key).

If your Primary Key is monstrous, your NCIs will also be monstrous, and therefore less efficient (more disk reads, more buffer pool consumption, bad database stuff).

Disclaimer: there can be specific scenarios where you want every column to be the clustered index key/pk. I don't sense that is applicable here, but it is possible. If you have a heavily used query that refers to every column of the table in where clauses or joins, than a coverage clustered index may be beneficial.

Sully
  • 1,313
  • 10
  • 14