0

If primary keys are always sorted, how can I store Guids in random order as primary key.

Akshay Raut
  • 413
  • 5
  • 19
  • Here's a great article discussing just that: http://blog.sqlauthority.com/2013/02/10/sql-server-primary-key-and-nonclustered-index-in-simple-words/ –  Oct 03 '14 at 19:17

3 Answers3

2

No, the table data is not always stored in the order of the primary key, but usually the primary key has a clustered index, and the data is always stored in the order of the clustered index.

If you don't want the data stored in the order of the primary key, you should use a non-clustered index for it.

Note that eventhough you usually get the data in the order that it is stored, the order is not guaranteed unless you use an order by clause. If the order is at all important, you should always specify what it should be.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • okay.. does Non-clustered index have any disadvantages? Thank you for answering. – Akshay Raut Oct 05 '14 at 09:37
  • @AKshayRaut: Only that records that are next to each other in the index are not stored next to each other in the table, so fetching a range of records can be somewhat slower. However having a clustered index for guids would cause heavy fragmentation, which would reduce performance for all queries, not just a few special cases. – Guffa Oct 05 '14 at 12:18
1

Well, primary keys aren't necessarily stored in sorted order on the disk. But clustered indexes are. And in the vast majority of cases the primary key is the clustered index. Though this doesn't necessarily guarantee sorting of results, it's just that results are usually sorted by the clustered index by default.

how can I store Guids in random order as primary key

GUIDs don't make for good clustered indexes for exactly this reason. SQL Server does have something called a Sequential GUID to address this. The resulting GUIDs won't be consecutive, but they will be sequential. It has some caveats, though:

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started.

If the system reboots, the sequence is lost. If multiple systems create keys, the sequence is lost. Additionally, there's the problem that we're still relying on the SQL Server to generate the key, which kind of defeats a significant reason to use a GUID.

In general I would suggest not using a GUID as a clustered index. As an alternative one might use a normal IDENTITY key as the clustered index and create a separate GUID column (with potentially an index of its own, and even a unique constraint just to make sure applications don't try to re-insert an existing record). That separate column becomes a kind of "global identifier" in a more business-logic sense, and not so much in a data persistence implementation sense.

David
  • 208,112
  • 36
  • 198
  • 279
  • 2
    Clustered index doesn't mean the data is necessarily in that order on disc. [my answer here goes into more detail](http://stackoverflow.com/a/24470091/73226) – Martin Smith Oct 03 '14 at 19:31
  • @MartinSmith: Am I misinterpreting the documentation? http://msdn.microsoft.com/en-us/library/ms190457.aspx `"Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order."` – David Oct 03 '14 at 19:33
  • 1
    The documentation is extremely misleading on this point. It should probably be corrected. – Martin Smith Oct 03 '14 at 19:38
  • right thank you. Also if I use IDENTITY column I'd make my data predictable to users, for e.g. User IDs on my asp.net website hold links to their profile photo and some other details, that are stored on server folder in the form of txt and jpg files, and these files will be in sequence once I use IDENTITY so it becomes transparent to clients. Since GUIDs may have an impact on performances, I guess I'd have to find other means of IDing. – Akshay Raut Oct 05 '14 at 09:52
  • @AKshayRaut: Why is predictability bad? A user may be able to guess an identifier that's in use, but that shouldn't matter. If that's a security hole in the system then I suspect there's a much more important security problem to fix, unrelated to identifiers. Don't rely on security through obscurity. – David Oct 05 '14 at 11:31
  • @David then what should I use according to you, to refer to user's files? – Akshay Raut Oct 07 '14 at 11:53
  • @AKshayRaut: Any identifier for those "files" would work for just referring to them. All an identifier needs to be is contextually unique. A file name itself can be an identifier since full names are unique on a file system. You could also name the file with a database identifier, or have a database table which associates file names with identifiers. None of this has to do with authorization. Relying on difficult-to-guess URLs for security is called "security through obscurity" and isn't a valid means of protecting data. If authorization is required, then require it for that data. – David Oct 07 '14 at 11:59
  • Right. I understand it now. Can you post some links of asp.net articles that can help me understand authorization for clients? Thank you for the effort and untangling the requirements. – Akshay Raut Oct 07 '14 at 12:08
  • @AKshayRaut: Well, there's this: http://msdn.microsoft.com/en-us/library/vstudio/wce3kxhd%28v=vs.100%29.aspx Basically I just did a Google search for "ASP.NET authorization" and there is a lot of information to be found on the subject. – David Oct 07 '14 at 12:09
1

No primary keys are not always stored in sorted order.

Nor are clustered indexes always stored in sorted order either, contrary to popular misconception.

If you choose a random GUID as a clustered primary key then you will likely very soon end up with a highly fragmented clustered index where the physical and logical order diverge greatly as pages become full and need splitting.

Generally most clustered index scans happen in logical (index key) order by following the page pointers rather than allocation (page number) order. In order for an allocation ordered scan to be considered you must be running at read uncommitted isolation level or a table lock must be held.

No order of results is ever guaranteed without order by however.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845