1

A brief overview of the scenario:

My database uses GUID's as primary keys, and, for what I've been reading, it seems like it's somewhat bad to have clustered indexes on GUID's (increases fragmentation, slows down inserts etc.). My project uses hibernate so we usually deal with jpql and fetching of full entities (a lot of queries end up turning into select p.* from person p [...])

I would like to know if it would be a good approach to create non-clustered indexes covering all columns of a table (in order to avoid RID lookups, etc.).

Thanks for the help, already!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Diego Martins
  • 89
  • 1
  • 7
  • 1
    No sense in adding non-clustered index on all columns if you aren't going to be searching by them. If you are going to be just searching by GUID then just make a non-clustered index on the GUID column. – mclaassen Jul 03 '14 at 19:11
  • No, no, sorry. Maybe I didn't make myself clear. The point is not creating non-clustered indexes in every column, but creating a non-clustered index in a SINGLE column (which I would be using a lot for search or joinings) and including every other column at the leaf level of this index. For instance: I have a table Person with columns like (person_uid, agency_uid, foo_uid, birth date, foo, bar) The PK is person_uid. Instead of creating a clustered-index on person_uid, I would like to create a non-clustered and include (agency_uid, foo_uid, birth date, foo, bar) – Diego Martins Jul 03 '14 at 19:22
  • 3
    A non clustered index, including all columns, on a heap is worse than simply having a clustered index. The NCI is bigger than the equivalent CI would be as it also stores the additional RID plus you have two copies of all the data. The NCI would be just as prone to fragmentation as the CI would be. This isn't an issue only encountered in clustered indexes. If you are only seeking single rows fragmentation itself won't be much of an issue for you anyway though you may want to look at fill factor to reduce page splits. – Martin Smith Jul 03 '14 at 19:46
  • Ooh... So even if disk space wasn't an issue, it would still be better to have a clustered index. The thing is: the way the database was designed, the default SQL behavior (creating a clustered index in the Primary Key) was disabled. So the vast majority of our tables doesn't have a Clustered Index, and lots of queries make RID lookups. I was searching for a solution to that. I think I'll at least include the PK at the leaf level of every NCI. – Diego Martins Jul 03 '14 at 19:48
  • Just add a `ID INT IDENTITY(1,1)` column and make that the primary key and clustered index! That would probably make the most sense since clustered tables are more efficient in general than heaps for every operation ... – marc_s Jul 03 '14 at 20:45
  • The problem is that it would create a great amount of bureaucracy (creating a new column in every table and using it). It would lead to lots of discussion and disagreement. So I was searching for a more immediate solution, something I could get done and would bring improvements now. – Diego Martins Jul 03 '14 at 20:57

2 Answers2

2

No, it is not a good approach. It sounds like you've already read that having the clustered index on a GUID is a bad idea. Instead, create an int (or bigint, if necessary) identity field and make that the clustered index, unless another field makes more sense. Then just create a nonclustered index on the GUID field, and let SQL do an RID lookup for each query that uses it. This way you can avoid fragmentation and slow inserts/updates/deletes.

Dave.Gugg
  • 6,561
  • 3
  • 24
  • 43
  • Dave, imagine that I just can't change the architecture at this level (create another identifier). Let's say I have to stick with UID's. What else could I do? – Diego Martins Jul 03 '14 at 19:26
  • Put the clustered index on a non-unique set of fields. Not the best solution, but that may be what you're stuck with. – Dave.Gugg Jul 03 '14 at 19:28
  • Why in a non-unique? Shouldn't indexes be preferably unique? – Diego Martins Jul 03 '14 at 19:30
  • Preferably yes, but if you don't have a field or set of fields that will be unique, you may be forced to have non-unique values. – Dave.Gugg Jul 03 '14 at 19:32
  • The problem is, the way the database was designed, the default SQL behavior (creating a clustered index in the Primary Key) was disabled. So a lot (and I really mean a lot) of queries make RID Lookups. The other indexes doesn't even include the primary key (which they would do by default if the PK had a clustered index). – Diego Martins Jul 03 '14 at 19:32
  • Sorry, I'm not really experient with this index stuff, but I'm really enjoying learning it. I don't get the point of creating a clustered index on the non-unique set of fields. What good would it do? – Diego Martins Jul 03 '14 at 19:34
  • It would still order the data, but each time non-unique values occurred SQL would have to scan through each of the values. It is far better to use unique fields, but sometimes it is not possible. – Dave.Gugg Jul 03 '14 at 19:38
  • And that would be better than creating non-clustered with every column included in the leaf level? Say I have a table Person with columns like (person_uid, agency_uid, foo_uid, birth date, foo, bar), and would like to get a person by its person_uid and project all the columns, every time. Wouldn't it be better to create a non-clustered index on person_uid including (agency_uid, foo_uid, birth date, foo, bar)? – Diego Martins Jul 03 '14 at 19:45
  • You're basically doubling the clustered index. Every insert, update, or delete will have to work on both the clustered index and your non-clustered index. – Dave.Gugg Jul 03 '14 at 19:51
  • The best option is to move the clustered index onto a new column or columns, how about birth date + first name + last name? – Dave.Gugg Jul 03 '14 at 19:52
  • But if I'm searching for person_uid, would this index be useful? Wouldn't it make an index seek on NCI person_uid and then a clustered index seek on the CI, or a CI scan directly? – Diego Martins Jul 03 '14 at 20:01
  • Yes, you'd still want to have a nonclustered index on person_uid, and you would still get a RID lookup. – Dave.Gugg Jul 03 '14 at 20:03
  • So it would still be better to have a RID lookup than to have a Clustered Index on person_uid? – Diego Martins Jul 03 '14 at 20:07
  • Selects will take longer, updates/deletes/inserts will go faster. Is that better? That depends on your situation. – Dave.Gugg Jul 03 '14 at 20:10
  • Something that gets me confused is this paper from Microsoft I read one of these days: http://technet.microsoft.com/en-us/library/cc917672.aspx#ECAA It shows inserts on clustered indexes slightly faster than on non-clustered, and that surprised me. I even benchmarked with GUID keys, and had nearly the same results! The only problem was fragmentation. – Diego Martins Jul 03 '14 at 20:17
0

Premature optimization is a bad idea. Is the data size cost and effort added to inserts, updates, and deletes worth adding the index? Unless you measure and test performance and the impact of your index, you won't know. Look at the queries that read the table and see which, if any, are unacceptably long. Then tune that specific query.

Vulcronos
  • 3,428
  • 3
  • 16
  • 24
  • The problem is, the way the database was designed, the default SQL behavior (creating a clustered index in the Primary Key) was disabled. So a lot (and I really mean a lot) of queries make RID Lookups. The other indexes doesn't even include the primary key (which they would do by default if the PK had a clustered index). – Diego Martins Jul 03 '14 at 19:32
  • @DiegoMartins I see. Adding an index like you suggested would still double the data stored. – Vulcronos Jul 03 '14 at 19:39
  • Yes, that's true. But I've been told that db size is not really an issue. I was thinking about taking this approach only on some of our tables. – Diego Martins Jul 03 '14 at 19:43
  • @DiegoMartins I can see that working. RID lookups are only expensive in bulk, so for queries that only return a single row it won't be worth it. For queries that return large data sets, then I can see a chance of it being helpful, but I would want to test first. – Vulcronos Jul 03 '14 at 19:50
  • Yes, sure! Not always I'm bringing lots of results, but mainly the application is hitting the database a lot of times with the same queries (changing the parameters). And these queries have somewhat of "unnecessary" RID lookups. So I think it would still be good in this scenario? – Diego Martins Jul 03 '14 at 19:55
  • @DiegoMartins If the queries change the parameters and don't use the key field of your index, then sql server could just scan the clustered key for the rows if your index doesn't offer an advantage in finding the rows. Make sure the index actually gets used. Unless the key fields of the index help the query, I don't expect any performance gains. – Vulcronos Jul 03 '14 at 20:03
  • The fields are used for search (or for joins with other tables). I guess it will be helpful. – Diego Martins Jul 03 '14 at 20:09