0

I have a table structure like below :

FeatureList

ID  - BIGINT - Primary Key  - Clustered Index
VIN - VARCHAR(50)
Text - VARCHAR(50)
Value - VARCHAR(50)

Most of the query I execute on this are like :

SELECT * FROM FeatureList WHERE VIN = 'ABCD'    --- Will give multiple records

OR 

DELETE FROM FeatureList WHERE VIN = 'ABCD'

I want to know, is VIN column is a good candidate for nonclustered index? Or it might degrade the performance?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Akon
  • 272
  • 1
  • 6
  • 20
  • Why is the PK a BIGINT? Are you really going to have more than 2 billion features in this table? Anyway, without knowing much else about your system, and how additional / wider indexes may impact your overall workload, it may make more sense to make the VIN clustered. Then at least a lookup won't be required to get the other columns in your SELECT * query... – Aaron Bertrand Nov 14 '13 at 05:06
  • Left headlight, right headlight, wheel front left, wheel front right... Yeah, you can totally get over 2B features on a car – billinkc Nov 14 '13 at 05:07
  • Not at this stage....currently I can see it to grow up to .. 5 to 10 million records..But is there a Harm in declaring it as BIGINT? – Akon Nov 14 '13 at 05:09
  • @billinkc - You totally got me :) – Akon Nov 14 '13 at 05:09
  • BIGINT = 8 bytes, INT = 4 bytes - that's an extra 4 bytes per row, plus an extra 4 bytes per row in any non-clustered index too. 10 million rows * 4 bytes * 1+? indexes = a lot of wasted space on disk and, more importantly, in memory. – Aaron Bertrand Nov 14 '13 at 05:11
  • Don't declare VIN as varchar(50). There are standards for these things. [VIN](http://en.wikipedia.org/wiki/Vehicle_identification_number) Assuming [US/CA VIN](http://www.ecfr.gov/cgi-bin/text-idx?c=ecfr&rgn=div5&view=text&node=49:6.1.2.3.31&idno=49) – billinkc Nov 14 '13 at 05:11
  • @AaronBertrand - Yes..that makes sense..but changing them now does not seem to be possible on my part..Should I make it a Non-Clustered index to improve performance? – Akon Nov 14 '13 at 05:14
  • @billinkc - It seems I am late to implement this..I will definitely keep it in my mind..next time I design something..Thanks. – Akon Nov 14 '13 at 05:15
  • @Akon sorry, I just don't have enough information here to help you. This is like me asking you if I should change my car over to snow tires, and you don't even know where I live. Do you actually have a performance problem? Or are you prematurely optimizing? – Aaron Bertrand Nov 14 '13 at 05:16
  • 1
    NCI is of questionable value (deferring to AB or others) as even if the engine seeks to the position, it will still have to do a key lookup to go to the physical index to get the rest of the data for a select. If you didn't have `SELECT *`, then maybe a NCI would be right for you. – billinkc Nov 14 '13 at 05:17
  • @AaronBertrand - But does not it make sense..from the 2 types of query that I use..What information exactly will help here? – Akon Nov 14 '13 at 05:18
  • @billinkc - Its a mix of `SELECT *` and `SELECT VIN,Text,Value` - I understand your point here. – Akon Nov 14 '13 at 05:19
  • Yes, SQL Server has theorized plans, but nothing is guaranteed. You can even give it hints and it may or may not use what you tell or expect it to. The best bet is to test the performance of your queries with and without the Non-clustered index. As others have said, your selected fields may have an impact. E.g., an index on VIN should most likely help if you do `Select Vin From ...`, but the index may be ignored if you do `Select Vin, Field2 from...`. – ps2goat Nov 14 '13 at 05:22
  • Depending on the complexity of queries, yours being extremely simple, you may need hundreds of thousands or even millions of rows before you start seeing noticeable impacts to performance. – ps2goat Nov 14 '13 at 05:23
  • I have millions of records in those tables from migration script.. – Akon Nov 14 '13 at 05:26

1 Answers1

0

Not declaring an index on VIN absolutely will drastically degrade performance. You take a small performance hit on each insert, delete, or update involving VIN. Reads (especially once you get into millions of records) will run orders of magnitude faster.

As for BIGINT versus INT, I generally go for BIGINT. Yes, it takes up a bit more disk space. Yes, it takes up a bit more memory. The plus side for me, though, is that I never, ever have to worry about migrating the table (and every other table that takes ID as a foreign key) to BIGINT. Been there. Done that. The extra space is worth it.

MichaelMilom
  • 3,024
  • 1
  • 16
  • 25
  • The table does have millions of records now..Depending upon the query I have posted..does it make sense to add NCI to VIN column..as some are saying the opposite.. – Akon Nov 14 '13 at 05:27
  • 1
    @Akon: it depends heavily on **how many** rows out of your millions your query will return. Also: using `SELECT *` makes it less likely that an index will be used; better fetch **only** columns that you **really need** - not just everything. So yes - `VIN` is definitely a candidate - you'll need to add the index and measure if it really helps your queries. – marc_s Nov 14 '13 at 05:35
  • `SELECT * FROM FeatureList WHERE VIN = 'ABCD'` this will give approx of 5 to 10 to 15 records.. Does not WHERE clause play a role in NCI – Akon Nov 14 '13 at 05:37
  • I would go for the index. SELECT * will certainly impact **how much** data is returned from the query, but I am not sure I agree that it will impact index usage. Lack of a WHERE clause? Absolutely. I would love to see query plan stats to back that up. – MichaelMilom Nov 14 '13 at 23:24