4

Table Structure (SalesPersonProduct)

SalesPersonProductId INT (PK, Clustered)   
SalesPersonId        INT (FK - non-unique - Nullable) 
ProductId            INT (FK - non-unique - Nullable)
AreaId               INT  (Not Null)
DistributionType     INT  (Not Null)
ItemOrder            INT  (Not Null)
ItemSize             INT  (Not Null)

The scenario: SalesPersonId is non-unique, in fact there could be 70 - 80 duplicates in this column. However, this is the only column used for reads and joins and currently does not have an index.

SalesPersonProductId is simply used as a PK for reference, updates to these table are handled through the PK but are very infrequent. Neither the PK or the SalesPersonId will be updated once a row has been added to the table.

The question:
Assume I can make no changes to the table structure, or related tables. Would I be better served changing the existing PK to an non-clustered index, and adding the SalesPersonId as a clustered index? OR just adding the SalesPersonId as a standard non-clustered index?

I'm aware behind the scenes the many non-unique values would be getting unique identifiers - however this column is used for virtually all reads and I would expect this would offer the best performance gain, though at the cost of the additional bytes now used to preserve "uniqueness" on this new clustered index.

  • 1
    The cost isn't the additional bytes, it is the page splits you would get from the writes caused by not having an every increasing pattern for your clustering key. If there are a lot of writes, you may find a non-clustered index (with possibly the most common columns included as non key columns) is better. If you are mostly reading, and reading large enough quantities of data that a non clustered index on `SalesPersonID` actually causes you issues then you may find that the there is a net gain from having `SalesPersonId` as your clustering key. Long story short, it depends. Do some tests. – GarethD Mar 27 '17 at 12:35
  • 1
    Is there anything that makes your rows unique except the `SalesPersonProductId` ? – SqlZim Mar 27 '17 at 12:43
  • Is (SalesPersonId, ProductId) unique? – Antonín Lejsek Mar 27 '17 at 12:52
  • @AntonínLejsek - No – AverageProgrammer Mar 27 '17 at 19:02
  • @SqlZim - possibly (SalesPersonId, ProductId, AreaId) – AverageProgrammer Mar 27 '17 at 19:02
  • Are there any nonclustered indexes on this table? How many columns are we talking about? What's the byte length of each row? Are there variable size columns in this table? – SqlZim Mar 27 '17 at 19:29
  • @SqlZim - added the remaining table structure. There are no indices on this table beyond the currently clustered PK. (SalesPersonProductId) – AverageProgrammer Mar 28 '17 at 11:54
  • @GarethD - could you explain the page splits? Does this cost performance, memory, or both? Primary concern is performance here since there is currently no index at all on SalesPersonId and it is being used for the reads and the joins. We also pull all columns I believe in every case. So short of an includes, a clustered index should circumvent a bookmark look up correct? – AverageProgrammer Mar 28 '17 at 11:58
  • It will cost performance during inserts - This article measures exactly how much impact it can have - https://www.sqlskills.com/blogs/paul/how-expensive-are-page-splits-in-terms-of-transaction-log/ – GarethD Mar 28 '17 at 12:33
  • @GarethD - I think that's heading in the correct direction. So additional info based on what I've seen. Multi-tenant environment in which each tenant is completely isolated. Anywhere from 5 - 75 sales people per tenant who on creation by default are assigned somewhere around 30 products to sell on average. – AverageProgrammer Mar 28 '17 at 17:41
  • There is very little point in discussing the hypothetical at great length, the best solution is to [race your horses](https://ericlippert.com/2012/12/17/performance-rant/). You basically have two alternatives, (1) do some testing, set up the various scenarios and run a load of inserts and selects that would replicate the usage, and test the time taken for each, (2) go with one of your options, and if later on performance becomes an issue, try the alternative. – GarethD Mar 28 '17 at 17:52
  • @GarethD I think that is an as acceptable an answer as any other. I had actually started such testing yesterday after adding the new index as a non-clustered index and an excessive amount of records from what we typically see in a production environment. Just from reviewing the index usage and operational stats I'm seeing far more reads on the existing PK in excess of what I had been led to expect. Though I understand that could be somewhat inflated from the bookmark look ups from the non clustered index then pointing to the PK. (There are no included columns in the NCI) – AverageProgrammer Mar 29 '17 at 14:43

1 Answers1

0

you are missing one thing.

i) which of the column will be use most to Search i.e. use most in where clause.if SalesPersonId then make it CI ,or you can make both SalesPersonId and productid as CI.

ii) If you can go one step further then calculate Seletivity of Index also

https://www.google.co.in/webhp?sourceid=chrome-instant&rlz=1C1CHMO_enIN556IN556&ion=1&espv=2&ie=UTF-8#q=how+to+find+selectivity+of+an+index

KumarHarsh
  • 5,046
  • 1
  • 18
  • 22