If your key is a GUID, then a non-clustered index on it is probably just as effective as a clustered index on it. This is because on GUIDs you absolutely never ever can have range scans on them (what could between 'b4e8e994-c315-49c5-bbc1-f0e1b000ad7c' and '3cd22676-dffe-4152-9aef-54a6a18d32ac'
possibly mean??). With a width of 16 bytes, a GUID clustered index key is wider than a row id that you'd get from a heap, so a NC index on a PK guid is actually strategy that can be defended in a discussion.
But making the primary key the clustered index key is not the only way to build a clustered index over your heap. Do you have other frequent queries that request ranges over a certain column? Typical candidates are columns like date
, state
or deleted
. If you do, then you should consider making those columns the clustered index key (it does not have to be unique) because doing so may help queries that request ranges, like 'all records from yesterday'.
The only scenario where heaps have significant performance benefit is inserts, specially bulk inserts. IF your load is not insert heavy, then you should definitely go for a clustered index. See Clustered Index Design Guidelines.
Going over over your points:
Almost certainly want to establish a clustered index on every table in
your database. If a table does not
have one. Performance of most common
queries is better.
A clustered index that can satisfy range requirements for most queries will dramatically improve performance, true. A clustered index that can satisfy order requirements can be helpful too, but nowhere as helpful as one that can satisfy a range.
Clustered indexes are not always bad on GUIDs... it all depends upon
the needs of your application. The
INSERT speed will suffer, but the
SELECT speed will be improved.
Only probe SELECTs will be improved: SELECT ... WHERE key='someguid';
. Queries by object ID and Foreign key lookups will benefit from this clustered index. A NC index can server the same purpose just as well.
The problem with clustered indexes in a GUID field are that the GUIDs are
random, so when a new record is
inserted, a significant portion of the
data on disk has to be moved to insert
the records into the middle of the
table.
Wrong. Insert into position in an index does not have to move data. The worst it can happen is a page-split. A Page-split is (somehow) expensive, but is not the end of the world. You comment suggest that all data (or at least a 'significant' part) has to be moved to make room for the new row, this is nowhere near true.
Clustered index on GUID is ok in situations where the GUID has a
meaning and improves performance by
placing related data close to each
other
http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html
I can't possibly imagine a scenario where GUID can have 'related data'. A GUID is the quintessential random structure how could two random GUIDs relate in any way? The scenario Donald gives has a better solution: Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, which is cheaper to implement (less storage required) and works for unique keys too (the solution in linked article would not work for unique keys, only for foreign keys).
Clustering doesn't affect lookup speed - a unique non-clustered index
should do the job.
For probes (lookup a specific unique key) yes. A NC index is almost as fast as the clustered index (the NC index lookup does require and additional key lookup to fetch in the rest of the columns). Where clustered index shines is range scans, as it the clustered index can cover any query, while a NC index that could potentially satisfy the same range may loose on the coverage and trigger the Index Tipping Point.