0

Currently I'm trying to write SQL Server NEWSEQUENTIALID() in .NET Core 2.2 that should be running really fast and also it should allocate minimum possible amount memory but I need clarification how calculate uuid version and when (which byte to place it or what bit shift is needed). So now I have generated timestamp, retrieved mac address and copied bytes 8 and 9 from some base random generated guid but surely I'm missing something because results doesn't match with output of original algorithm.

byte[16] guidArray; 
// mac
guidArray[15] = macBytes[5];
guidArray[14] = macBytes[4];
guidArray[13] = macBytes[3];
guidArray[12] = macBytes[2];
guidArray[11] = macBytes[1];
guidArray[10] = macBytes[0];

// base guid
guidArray[9] = baseGuidBytes[9];
guidArray[8] = baseGuidBytes[8];

// time
guidArray[7] = ticksDiffBytes[0];
guidArray[6] = ticksDiffBytes[1];
guidArray[5] = ticksDiffBytes[2];
guidArray[4] = ticksDiffBytes[3];
guidArray[3] = ticksDiffBytes[4];
guidArray[2] = ticksDiffBytes[5];
guidArray[1] = ticksDiffBytes[6];
guidArray[0] = ticksDiffBytes[7];

var guid = new Guid(guidArray);

Current benchmark results:

                      Method |      Mean |    Error |   StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |----------:|---------:|---------:|------:|--------:|-------:|------:|------:|----------:|
| SqlServerNewSequentialGuid |  37.31 ns | 0.680 ns | 0.636 ns |  1.00 |    0.00 | 0.0127 |     - |     - |      80 B |
|              Guid_Standard |  63.29 ns | 0.435 ns | 0.386 ns |  1.70 |    0.03 |      - |     - |     - |         - |
|                  Guid_Comb | 299.57 ns | 2.902 ns | 2.715 ns |  8.03 |    0.13 | 0.0162 |     - |     - |     104 B |
|              Guid_Comb_New | 266.92 ns | 3.173 ns | 2.813 ns |  7.16 |    0.11 | 0.0162 |     - |     - |     104 B |
|                 MyFastGuid |  70.08 ns | 1.011 ns | 0.946 ns |  1.88 |    0.05 | 0.0050 |     - |     - |      32 B |

Update: Here are the latest results of benchmarking common id generators written in .net core.

Benchmark results As u can see my implementation NewSequentialGuid_PureNetCore is at most 2x worst performing then wrapper around rpcrt4.dll (which was my baseline) but me implementation eats less memory (30B).

Here are a sequence of sample first 10 guids:

492bea01-456f-3166-0001-e0d55e8cb96a
492bea01-456f-37a5-0002-e0d55e8cb96a
492bea01-456f-aca5-0003-e0d55e8cb96a
492bea01-456f-bba5-0004-e0d55e8cb96a
492bea01-456f-c5a5-0005-e0d55e8cb96a
492bea01-456f-cea5-0006-e0d55e8cb96a
492bea01-456f-d7a5-0007-e0d55e8cb96a
492bea01-456f-dfa5-0008-e0d55e8cb96a
492bea01-456f-e8a5-0009-e0d55e8cb96a
492bea01-456f-f1a5-000a-e0d55e8cb96a

If u want code then give me a sign ;)

Macko
  • 906
  • 3
  • 11
  • 27

1 Answers1

1

The official documentation states it quite clearly:

NEWSEQUENTIALID is a wrapper over the Windows UuidCreateSequential function, with some byte shuffling applied.

There are also links in the quoted paragraph which might be of interest for you. However, considering that the original code is written in C / C++, I somehow doubt that .NET can outperform it, so reusing the same approach might be a more prudent choice (even though it would involve unmanaged calls).

Having said that, I sincerely hope that you have researched the behaviour of this function and considered all its side effects before deciding to pursue this approach. And I certainly hope you aren't going to use this output as a clustered index for your table(s). The reason for this is also mentioned in the docs (as a warning, no less):

The UuidCreateSequential function has hardware dependencies. On SQL Server, clusters of sequential values can develop when databases (such as contained databases) are moved to other computers. When using Always On and on SQL Database, clusters of sequential values can develop if the database fails over to a different computer.

Basically, the function generates a monotonous sequence only while the database is in the same hosting environment. When:

  • a network card gets changed on the bare metal (or whatever else the function depends upon), or
  • a backup is restored someplace else (think Prod-to-Dev refresh, or simply prod migration / upgrade), or
  • a failover happens, whether in a cluster or in an AlwaysOn configuration

, the new SQL Server instance will have its own range of generated values, which is supposed not to overlap the ranges of other instances on other machines. If that new range comes "before" the existing values, you'll end up with fragmentation issues for absolutely no good reason. Oh, and top (1) to get the latest value won't work anymore.

Indeed, if all you need is a non-exhaustible monotonous sequence, follow the Greg Low's advice and just stick to bigint. It's half as wide, and no, you can't possibly exhaust it.

Roger Wolf
  • 7,307
  • 2
  • 24
  • 33
  • 1
    Would you care to elaborate as to the lack of fitness for a clustering key? It's wide, but should avoid the fragmentation problems inherent with a non-sequential uuid. – Ben Thul Dec 30 '19 at 07:14
  • 1
    NEWSEQUENTIALID() is basically a useless function. It provides you with a "pretend" GUID which doesn't work as well as a real GUID, but the biggest issue is that it can only be used as a column default. If you're going to go to the database to get a value, what's the possible use of getting a pretend GUID? Just get a bigint instead and be done with it. The value of GUIDs is that you don't need to get them from a central issuing authority like a database. If you're going to go to the DB, just get a bigint. – Greg Low Dec 30 '19 at 07:17
  • @greg low - hmm, this function was written by m$ specifically for db use: standard guid doesn't index well so this my case and guid is my type of choice for all pk's. Common .net solutions these days generate ids before inserting row to db ... – Macko Dec 30 '19 at 07:51
  • @ben thul - that is why this function exists in SQL server, to index better column in db and to better perform as well, so I don't want to request db to get my sequential guid ... – Macko Dec 30 '19 at 07:55
  • 1
    @GregLow - Not quite useless - for some replication scenarios where data can be created in multiple locations I can see that it could be useful. I agree if the inserts are all to one master database bigint and sequence or identity would be better though – Martin Smith Dec 30 '19 at 08:27
  • @Macko - What advantages do you see to using a sequential guid vs a sequential bigint? – Martin Smith Dec 30 '19 at 08:34
  • @BenThul, updated the answer with the explanation. Indeed, I should have added all of this from the beginning. – Roger Wolf Dec 30 '19 at 08:43
  • You won't have tons of page splits on inserts. You will just have a minor amount of logical fragmentation equivalent to the number of occasions that this failover has happened. Say machine 1 does 10,000 inserts and these take pages 100 to 200 in the file. Then it fails over to machine 2 which does another 10,000 inserts and uses pages 201 to 300 - assume machine 2 generates a lower value than machine 1 there is just a small amount of logical fragmentation in that when reading the index in key order the read will need to go from page 201 to 300 then jump to page 100 and read the next 100 – Martin Smith Dec 30 '19 at 08:58
  • @MartinSmith, I was actually thinking about physical fragmentation. Wouldn't the B-tree be continuously rebalanced around the middle of the table, where new inserts are coming? – Roger Wolf Dec 30 '19 at 09:05
  • I'm not sure what you mean by physical fragmentation here, from the fragmentation categories [here](https://logicalread.com/2013/12/11/fragmentation-sql-server-table-scans-bf01/#.Xgm-KUf7RPY) are you talking about external fragmentation on the file system? If so that is not affected by this. And no this won't cause any massive issues as you can see from https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=ec4c5e894fe2b50c2c2805e415b1e443. If you reverse the order of the inserts so the earlier cluster is inserted earlier the eventual fragmentation is much the same – Martin Smith Dec 30 '19 at 09:24
  • (above experiment will be best done with table on new empty file group for repeatable results) – Martin Smith Dec 30 '19 at 09:30
  • @Martin Smith - it looks that implementing sequential guid is much simplier than sequential bigint as I don't have to read previously generated values ... – Macko Dec 30 '19 at 09:39
  • @MartinSmith, I thought about page splits on intermediate level(s). But it's possible that MSFT has thought about it. Removed that mentioning from the answer anyway. – Roger Wolf Dec 30 '19 at 10:09
  • @Macko, before you tightly-couple your database with your app, just 2 things to consider: how you are going to persist the latest generated value between app' restarts, and how you are going to synchronise it when / if your app will run on multiple nodes of a web cluster? Neither of these will be necessary if sequence generation is left to the database, and I still don't see what kind of goal you are trying to achieve. – Roger Wolf Dec 30 '19 at 10:17
  • @Roger Wolf - first, the target is not rewrite given function as performant as possible. Frankly, I'm not going currently to persist latest generated values to some store (file system, registry or other different than server memory). If u investigate main point of this algorithm that u know that it involves generating this components: mac address (constant per server mostly), time part (dynamically changed to tick precision) and some sort of randomness (2 byte seed changed every new guid generate). It is sufficient for me to involve db to generate any id ... – Macko Dec 30 '19 at 13:34
  • @Macko, oh, so it wasn't really about databases... alright then :) – Roger Wolf Dec 30 '19 at 14:27
  • At the time the function was written, there was a major concern in the product group about the (poor) effects of GUIDs being used as clustering keys (particularly as clustered primary keys). This function was written in response to that but missed the main point. (Not everything they build is a great solution). If the database is going to be the "issuer of values", then there's no value in having it be a "pretend GUID". As I said, just use a bigint. If you have multiple machines, use ranges of values. bigints are huge. – Greg Low Dec 31 '19 at 22:57
  • I've written about this many times in the past. Here are two links: https://blog.greglow.com/2017/10/26/newsequentialid-nice-try-but-missed-the-target/ and https://blog.greglow.com/2018/05/01/opinion-newsequentialid-is-a-pointless-function/ – Greg Low Dec 31 '19 at 23:02
  • @Greg: generating bigints has multiple downsides - i think about generating non-repeatable value before inserting row to database - with ints it could be harder to generate values that don't collide often and it looks like that such algorithm should have knowledge of last generated value (memorize this value in some store). With guid there is such issue and there are some security bonuses in place. As usual some solutions solves some problems but new problems arises ... – Macko Jan 01 '20 at 00:09
  • Hi Macko, the pretend GUIDs you get from this function have no security benefits. If you assuming they aren't predictable and help you with obscurity, you'd be disappointed. The values generated are entirely predictable. Why would it be hard generating bigints that don't collide (assuming you're going to the DB anyway)? Just use a sequence. – Greg Low Jan 02 '20 at 01:29
  • @Greg - u are right in terms of security but it is completely acceptable because it is intended to be used in internal environment. With big ints it is hard because algorithm to be precise should know about previously generated values - I don't want to query db to know this value - that is my requirement. In my aproach this uuid is generated by application code not by db. – Macko Jan 02 '20 at 09:29
  • Hi Macko, so that's actually my point about NEWSEQUENTIALID(). The ONLY way it can be used is as a column default in a database. You said you don't want to query the db to know the value. That's the only thing it can do. If you just use CREATE SEQUENCE and make the object a bigint, then use it as your column default, you still get a auto-generated value with no collisions, and you get the benefit of it being an ever-increasing bigint. There's no algorithm that you need to manage. If your uuid is generated by application code, then you won't be using NEWSEQUENTIALID(). – Greg Low Jan 04 '20 at 22:02