1

I was just reading from here https://stackoverflow.com/questions/6187904/hard-and-fast-rule-for-include-columns-in-index regarding included columns.

n index is typically

CREATE INDEX <name> ON <table> (KeyColList) INCLUDE (NonKeyColList)

Where:

  • KeyColList = Key columns = used for row restriction and processing
    WHERE, JOIN, ORDER BY, GROUP BY etc
  • NonKeyColList = Non-key columns = used in SELECT and aggregation (e.g. SUM(col)) after selection/restriction

Now let's say my query is :

SELECT  Col7 ,
        Col8 ,
        Col9 ,
        Col10
FROM    [MyTable] M
WHERE   Col1 =1
        AND Col2=2
        AND Col3=3                         
        AND Col4=4 
        AND Col5=5 
        AND Col6=6
GROUP BY Col7 ,
         Col8 ,
         Col9 ,
         Col10
ORDER BY Col8

What should be the index for me here in this case? and second case where Col7 is primary key

Community
  • 1
  • 1
Rocky Singh
  • 15,128
  • 29
  • 99
  • 146
  • 3
    Exactly what I suggested here: http://stackoverflow.com/questions/6186038/question-regarding-index/6186100#6186100 – gbn May 31 '11 at 13:54
  • Sounds like a job for Database Engine Tuning Advisor – JeffO May 31 '11 at 14:18

2 Answers2

1

I would think you would want something like:

CREATE NONCLUSTERED INDEX MyIndex ON MyTable(Col1, Col2, Col3, Col4, Col5, Col6)
INCLUDE (Col7, Col8, Col9, Col10)

You're filtering on Col1-Col6 and retrieving Col7-Col10. Not sure how this will work with the group by, though. You may want to rewrite it as a DISTINCT since the exec plan and results are the same but it will be more readable.

If Col7 is a primary key and has a clustered index, you can leave this like it is. Col7 will be included in the index without being specifically mentioned as a cluster key is in every non-clustered index as a row identifier, but keeping it in won't hurt as SQL will ignore it.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • I mentioned that in the first of OPs' questions today! http://stackoverflow.com/questions/6186038/question-regarding-index/6186100#6186100 (except for the PK stuff) – gbn May 31 '11 at 13:55
  • It may be worth putting Col7 as the final column inside the index key, as you'll want to guarantee that the index is sorted based on Col7 (after Col1-Col6 have been matched), to aid in the GROUP BY operation (i.e., GROUP BY requires sorted rows). – Jim McLeod May 31 '11 at 13:56
  • @Jim - if we put it as the final column it couldn't use it for the `GROUP BY` regardless since `Col7` is the first column in the `GROUP BY`. – JNK May 31 '11 at 13:58
  • 1
    If Col7 is the PK, then it's unique, and thus grouping by Col8-10 is redundant. Given that Col1-Col6 are given as constants in the WHERE clause, we would end up with an ordered unique set, for each group where Col1-Col6 are matched. Thus, Col7 would be pre-sorted, and usable by the GROUP BY. – Jim McLeod May 31 '11 at 14:04
  • @Jim - good point! I stand corrected. Forgot to account for the Unique factor. – JNK May 31 '11 at 14:05
0

For this query it depends on how many rows are restricted by your where for each of col 1-6. The best index is the one that is for the column that has lthe column that has the highest selectivity and returns the least rows per value.

the group by will mean that Col7 might benefit by being part of the index so rows come back grouped already.

If all of the Col 1-6 provide a large amount of the table then perhaps Col7,Col8 might better to get the group by correct.

But in any case you will need to test as it depends on the distribution of the data values.

mmmmmm
  • 32,227
  • 27
  • 88
  • 117
  • -1 for: `The best index is the one that is for the column that has less rows ` - Not sure how you can make a table where the columns all have different rowcounts. – JNK May 31 '11 at 13:37
  • I must learn to reread what I write - edited to try to show what I meant. – mmmmmm May 31 '11 at 13:40
  • gotcha, DV removed. Maybe you can rephrase as "the column that has the highest selectivity and returns the least rows per value" – JNK May 31 '11 at 13:41