0

It is well known that using random values in a column with a clustered index isn't a good idea, which is why using GUIDs for a primary key with a clustered index is usually not recommended. Using the newsequentialid() function, we can overcome most of these difficulties.

However, what happens if you are generating your GUIDs on a farm of web servers, all hitting the same database? I am creating sequential IDs in .NET code using UuidCreateSequential as described in this article: http://blogs.msdn.com/b/dbrowne/archive/2012/07/03/how-to-generate-sequential-guids-for-sql-server-in-net.aspx

The problem is that while the resulting GUIDs are sequential from a single machine, the same is not true across multiple machines. Because the most significant 11 bytes (according to SQL Server) seem to stay pretty much the same for the same machine, it effectively sorts by machine and then time, rather than the desired opposite.

Would it be worthwhile and doable to reorder the bytes in the GUID to get near-sequential GUIDs between machines, or should I give up and make the indexes non-clustered?

Thanks!

kodbuse
  • 990
  • 2
  • 9
  • 20
  • 2
    *Any* clustered index (primary key or otherwise) based upon a GUID is a bad idea, for exactly the reasons you've outlined here. Creating additional work just for the sake of maintaining a semi-arbitrary sort order seems wasteful to me. Why not just establish a more sensible clustered index? – alroc Sep 22 '15 at 20:56
  • 1
    If you are just going to use sequential GUIDs why bother with them at all? They take up a huge amount of storage and now they are totally predictable and repeatable. – Sean Lange Sep 22 '15 at 20:57
  • 1
    @SeanLange: The reason is to control the IDs from code, which can be convenient/practical, especially in a DDD-based architecture. – kodbuse Sep 22 '15 at 21:16
  • @alroc: I figured there might be some performance benefits to maintaining a clustered index based on the primary GUID key, if those keys are in natural order (i.e. related rows are likely to be close together) – kodbuse Sep 22 '15 at 21:16
  • Good candidates for a clustering key are small, unique, and monotonic. One out of three ain't bad I guess. ;) If you're hell bent on using GUIDs at all, you could have them be a non-clustered primary key and a bog standard identity column be your clustering key. – Ben Thul Sep 22 '15 at 22:12
  • Aside from the fore mentioned "don't do this" the alternative would be to implement the key generation into an independent subsystem. Rather than each web farm being able generate ID's disparately they would have to get the new ID from a centralized service. Probably not worth the headache. MS follows similar practice in their own technology ie Dynamics CRM – Brad D Sep 23 '15 at 01:59
  • Actually @BradD this is a somewhat common thing now that we have sequences. You could put the sequence on the centralized DB and use that as a consistent numbering mechanism. – Sean Lange Sep 23 '15 at 13:10

2 Answers2

1

After trying this, I'm going to answer my own question and say that generating sequential GUIDs (COMB GUIDs) from multiple machines as described in the question is a non-issue. Essentially you will have one separate sequence of IDs per machine, which will not result in page splits, since they will be appended to the end of different pages, and not in the middle of a page (as a new ID will always be the largest in its sequence).

While a GUID may not be as efficient as an int, I have not had any problems using this approach with millions of rows per table.

kodbuse
  • 990
  • 2
  • 9
  • 20
  • Please don't depend on `NEWSEQUENTIALID()` for generating sequential GUIDs. Firstly, [the server's NIC will be used to seed the generator](https://connect.microsoft.com/SQLServer/feedback/details/475131/newsequentialid-is-not-sequential), so if it changes you're in for trouble. Lastly, [after a server restart there's no guarantee that the previous sequence will be reserved](http://dba.stackexchange.com/questions/53617/newsequentialid-resets-sequence-on-server-restart). – easuter Aug 15 '16 at 19:32
  • 1
    @easuter: For the reasons already stated, the Mac address changing would be a non-issue. It'd be no different than adding another server. It's also true that the sequence can change with a reboot, although I've had trouble finding information on exactly what that would look like. Regardless, the point is that these pseudo sequential GUIDs work fine for the situation I described and are in no way comparable to using NEWID(). – kodbuse Aug 26 '16 at 01:01
0

you can also generate your id's on c#, take a look into this post on code project the prb is that the code generated by this implementation doesn’t match what NEWSEQUENTIALID generates, since that my goal was that the c# code would generate the last 6 bytes of the Guid as the NewSequentialID function of Sql server, I end up with the following code.

public static Guid ToSequentialAtEnd(this Guid guid)
{
    byte[] guidArray = guid.ToByteArray();

    DateTime now = DateTime.UtcNow;
    var baseDate = new DateTime(1900, 1, 1);

    // Get the days and milliseconds which will be used to build the byte string 
    var days = new TimeSpan(now.Ticks - baseDate.Ticks);
    TimeSpan msecs = now.TimeOfDay;

    // Convert to a byte array 
    // Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.33333333 
    byte[] daysArray = BitConverter.GetBytes(days.Days);
    byte[] msecsArray = BitConverter.GetBytes((long)(msecs.TotalMilliseconds / 3.33333333));

    // Reverse the bytes to match SQL Servers ordering 
    Array.Reverse(daysArray);
    Array.Reverse(msecsArray);

    // Copy the bytes into the guid 
    Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2);
    Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4);

    return new Guid(guidArray);
}