16

I need to generate a unique ID and was considering Guid.NewGuid to do this, which generates something of the form:

0fe66778-c4a8-4f93-9bda-366224df6f11

This is a little long for the string-type database column that it will end up residing in, so I was planning on truncating it.

The question is: Is one end of a GUID more preferable than the rest in terms of uniqueness? Should I be lopping off the start, the end, or removing parts from the middle? Or does it just not matter?

izb
  • 50,101
  • 39
  • 117
  • 168
  • That is a good question. I have tended to use the middle bits, but I do not believe there is a difference. – Aliostad Oct 31 '11 at 16:59
  • I will run a monte carlo experiment and will publish the results. – Aliostad Oct 31 '11 at 17:01
  • 4
    Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx with any hexadecimal digits for x but only one of 8, 9, A, or B for y. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479. – user194076 Oct 31 '11 at 17:01
  • 1
    If you can't possibly keep all of it (and you really should!), consider taking the GUID's 128 bit value and re-encoding it in something more compact that you can squeeze into your shorter string field. – Clinton Pierce Oct 31 '11 at 17:02
  • You still expect it to be globally unique after truncation, do you? Consider a different definition of uniqueness (like, locally unique, unique within your server farm, etc.) You might be able get away with a smaller ID. – Seva Alekseyev Oct 31 '11 at 17:22
  • 1
    I don't expect it to be globally unique - only reasonably unique within the constrains of my limited database column, and only (for the purposes of my application) for a short time. – izb Nov 01 '11 at 07:30

5 Answers5

16

You can save space by using a base64 string instead:

var g = Guid.NewGuid();
var s = Convert.ToBase64String(g.ToByteArray());

Console.WriteLine(g);
Console.WriteLine(s);

This will save you 12 characters (8 if you weren't using the hyphens).

Austin Salonen
  • 49,173
  • 15
  • 109
  • 139
14

Keep all of it.

From the above link:

* Four bits to encode the computer number,
* 56 bits for the timestamp, and
* four bits as a uniquifier.

you can redefine the Guid to right-size it to your needs.

Rob Haupt
  • 2,104
  • 1
  • 15
  • 24
  • 1
    The included information is not relevant to Version 4 UUID. There may be reasons to keep the entire GUID, but this is effectively a bare-link - and even though it's a good read, the relevant (and accurate) information should be present in an answer. – user2864740 Oct 29 '18 at 00:52
8

If the GUID were simply a random number, you could keep an arbitrary subset of the bits and suffer a certain percent chance of collision that you can calculate with the "birthday algorithm":

double numBirthdays = 365;  // set to e.g. 18446744073709551616d for 64 bits
double numPeople = 23;      // set to the maximum number of GUIDs you intend to store
double probability = 1; // that all birthdays are different 
for (int x = 1; x < numPeople; x++) 
   probability *= (double)(numBirthdays - x) / numBirthdays; 

Console.WriteLine("Probability that two people have the same birthday:");
Console.WriteLine((1 - probability).ToString());

However, often the probability of a collision is higher because, as a matter of fact, GUIDs are in general NOT random. According to Wikipedia's GUID article there are five types of GUIDs. The 13th digit specifies which kind of GUID you have, so it tends not to vary much, and the top two bits of the 17th digit are always fixed at 01.

For each type of GUID you'll get different degrees of randomness. Version 4 (13th digit = 4) is entirely random except for digits 13 and 17; versions 3 and 5 are effectively random, as they are cryptographic hashes; while versions 1 and 2 are mostly NOT random but certain parts are fairly random in practical cases. A "gotcha" for version 1 and 2 GUIDs is that many GUIDs could come from the same machine and in that case will have a large number of identical bits (in particular, the last 48 bits and many of the time bits will be identical). Or, if many GUIDs were created at the same time on different machines, you could have collisions between the time bits. So, good luck safely truncating that.

I had a situation where my software only supported 64 bits for unique IDs so I couldn't use GUIDs directly. Luckily all of the GUIDs were type 4, so I could get 64 bits that were random or nearly random. I had two million records to store, and the birthday algorithm indicated that the probability of a collision was 1.08420141198273 x 10^-07 for 64 bits and 0.007 (0.7%) for 48 bits. This should be assumed to be the best-case scenario, since a decrease in randomness will usually increase the probability of collision.

I suppose that in theory, more GUID types could exist in the future than are defined now, so a future-proof truncation algorithm is not possible.

Qwertie
  • 16,354
  • 20
  • 105
  • 148
  • I appreciate the proof. I had seen the algorithm before, but never considered using it like this. Thanks for the lesson. – jsuddsjr Mar 18 '15 at 19:25
  • disagree... as Guid is not Uniform random bits... it recognized by computer & time & indexer. therefore on the same machine tere will be static set of bits. – Tomer W Dec 26 '17 at 13:24
  • Also, for something like `newsequentialid()`, choosing certain bits is almost 0% random :} – user2864740 Oct 29 '18 at 00:51
0

I agree with Rob - Keep all of it.

But since you said you're going into a database, I thought I'd point out that just using Guid's doesn't necessarily mean that it will index well in a database. For that reason, the NHibernate developers created a Guid.Comb algorithm that's more DB friendly.

See NHibernate POID Generators revealed and documentation on the Guid Algorithms for more information.

NOTE: Guid.Comb is designed to improve performance on MsSQL

Kaleb Pederson
  • 45,767
  • 19
  • 102
  • 147
-1

Truncating a GUID is a bad idea, please see this article for why.

You should consider generating a shorter GUID, as google reveals some solutions for. These solutions seem to involve taking a GUID and changing it to be represented in full 255 bit ascii.

NibblyPig
  • 51,118
  • 72
  • 200
  • 356