20

I would like optimize the performance of a database that my team is using for an application.

I have been looking for areas to add foreign keys, and in turn index those columns to improve the performance of joins. However, many of our tables are joined on an id that is a GUID type, generated upon insertion of an item, and the data associated with that item in other tables is generally has column item_id containing the GUID.

I have read that adding clustered indexes to GUID type columns is a very bad decision because the index will need to be constantly reconstructed in order to be effective. However, I was wondering, is there any detriment to utilizing a non-clustered index in the scenario described above? Or is it reasonable to assume that it would help performance? I can provide more information if needed.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Christian
  • 1,685
  • 9
  • 28
  • 48

5 Answers5

25

An index on a <anytype> is by far the best option you have to improve joins and singleton lookups. Lacking this index the query will always have to scan the entire table end-to-end with (often) abysmal performance results and concurrency gone down the drain.

It is true that uniqueidentifier makes poor choice for indexes for the reasons you mention, but by no means does that implies that you should not create these indexes. Changing the data type to INT or BIGINT would be advisable, if possible. Using NEWSEQUENTIALID() or UuidCreateSequential to generate them would help with fragmentation issues. If all alternatives fail you may have to do index maintenance (Rebuild, reorganize) operations more often than for other indexes. But by no means do any of these drawbacks outweigh the benefit of having the index in the first place!

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • I am not sure if I will be able to change the data type we have used, as we are nearing a deadline and I don't know what the implications of switching to an int would be at this point(many stored procedures utilize the column so it would involve many updates regarding the datatype). When you say an index on a , do you mean that an index on an appropriate column of any data type improve the performance of queries on the table? Also, do you feel that a non-clustered index on a `uniqueidentifier` is a better choice then a clustered as indicated in the response below? – Christian Dec 10 '12 at 18:41
  • If you are referring to my answer. Reserve the clustered index for the PK (the table the FK is pointing to). – paparazzo Dec 10 '12 at 19:04
  • 1
    By 'anytype' I mean that even the worst kind of indexable datatype you can think of (eg. even a `varchar(900)`) is often better to have the index than not to have it. If an index is needed (== seek operations occur) then there is just no alternative not to have the index. Of course is better to improve the data type if possible. As always there are many shades of gray between white and black, but the starting point should be 'the index is necessary, how can we lower the impact of having it?' – Remus Rusanu Dec 10 '12 at 19:16
  • 3
    And yes, a NC index on an non-sequential uniqueidentifier is better as a clustered one *from the point of view of index structure and health*. However, if the uniquiedtifier ID column si the one most lookups (seek) occur one, then it should be the clustered one, even at the cost of an poorly structured (fragmented) index because having it as NC index and having to lookup the rest of the columns for each row *may* defeat its purpose. See http://www.sqlskills.com/blogs/kimberly/category/the-tipping-point.aspx – Remus Rusanu Dec 10 '12 at 19:20
5

Two performance:
- insert
- select

An index should improve select

An index will slow slow down insert.
If the inserts are in order the index does not fragment.
If the inserts are not in order the index will fragment.
Index fragmentation slows down both insert and select.
Via maintenance can defragment the index.

Adding an non-clustered index to the column that references a FK will help the joins.
Since that column is most likely not ordered that fact it is a GUID is of no loss.

On the FK table itself is where GUID is not a good candidate for a PK (clustered index).
With GUID as PK that index fragments on insert.
Int or sequential ID are better candidates as they would not fragment the PK on insert.
But no big deal just defragment those tables.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • Thank you for the info. How does the slow down for insertion compare to the performance increase in retrieval? Do they completely balance, favor either increase or decrease, or is it entirely case dependent? Once the application is deployed we will not be able to access the database, can tasks be automated within the database upon creation?/deploy? – Christian Dec 10 '12 at 18:44
  • Varies. A properly designed and maintained index will not double insert time. On a big table it might benefit select by 10x or more. If you have more selects than insert / update and the column is in the where or join clause then a candidate for index. I typically index and FK. As for as not able to access that database once it is deployed then more reason to have the PK be int or sequential. – paparazzo Dec 10 '12 at 19:01
2

Yes, you are better off changing the Guid index from clustered to non-clustered. Guid can still be primary key and you don't need to change your query/source code. No reordering of data and increased performance.

In databases like SQL Azure it is mandatory to have a clustered index. So you could use a date/datetime field. Creating a additional int-identity/autoincrement column is unnecessary as some developers in one team tend to use those and others GUID. Resulting in an inconsistent application. So keep only GUID.. full stop!

Talking about sequential Guids, I think Guids are better created from code than from database. Modern DALs and repository patterns do not prefer dependencies on DB for CRUD. e.g. scenario: linq query and automated builds with unit testing with out DB dependency. And creating a sequential guid ourselves is not a good idea(atleast for me). So Guid as primary Key with a non-clustered index is the best option there is.

I have backing from Microsoft on the non-clustered subject http://blogs.msdn.com/b/sqlazure/archive/2010/05/05/10007304.aspx

Edited: Backing is gone ("No Resource Found")

Blue Clouds
  • 7,295
  • 4
  • 71
  • 112
1

It would usually help performance. But you may wish to create the index with a fillfactor of less than 100% such that the inevitable page-splits don't have to happen quite so often. Regular maintenance on the index would certainly be a plus.

Matt Whitfield
  • 6,436
  • 3
  • 29
  • 44
0

Yes, a non-clustered index would be ideal for your situation. The underlying is a B-tree, like the clustered index, but the underlying data on the table is not sorted, so the problems with the non-sequential nature of the GUID does not exist. The NC index exist separately from the table.

Be careful to not add too many non-clustered indices though. Optimize only where you need to. Run the profiler to see which queries are taking a long time, and optimize only those. Additionally, be sure to set the fill factor to a value <50% unless the database rarely gets any updates, or space is a constraint.

Relevant MSDN: http://msdn.microsoft.com/en-us/library/ms177484(v=sql.105).aspx

Fredrik E
  • 1,840
  • 13
  • 18
  • *but it is not sorted, so the problems with the non-sequential nature of the GUID does not exist* That's actually wrong. NC indexes are sorted, like each and every BTree out there. They also do suffer from all the guid issues. – Remus Rusanu Dec 10 '12 at 20:05
  • Ah, that is not what I meant. The underlying data is not sorted in a non-clustered index, so the table does not suffer from the guid issues. I have updated my answer. Thanks for pointing it out. – Fredrik E Dec 13 '12 at 11:35