17

I'm using Python's UUID function to create unique IDs for objects to be stored in a database:

>>> import uuid
>>> print uuid.uuid4()
2eec67d5-450a-48d4-a92f-e387530b1b8b

Is it ok to assume that this is indeed a unique ID?

Or should I double-check that this unique ID has not already been generated against my database before accepting it as valid.

ensnare
  • 40,069
  • 64
  • 158
  • 224
  • [This question](http://programmers.stackexchange.com/questions/130261/uuid-collisions) may be informative. In essence, it is possible that the ID is nonunique, however the chance of that is unbelievably small. – BrenBarn Jun 01 '14 at 18:39
  • 1
    The odds of two `uuid4`s colliding are 1 in `16**32`. – roippi Jun 01 '14 at 18:41
  • 7
    @roippi It's not that easy. Part of the UUID is a timestamp. So, if the UUIDs were generated at different times, the probability of a duplicate is 0. Otherwise, it depends on the RNG that's being used. [In Python 3](http://svn.python.org/projects/python/branches/py3k/Lib/uuid.py), the RNG of libc or libuuid is used, if available. Otherwise, it tries `os.urandom` and falls back to the `random` module. So, in conclusion, you should be safe. (Sorry for being overly specific, but I thought that might be interesting, for you or for the OP or someone else.) – Carsten Jun 01 '14 at 18:52

3 Answers3

11

I would use uuid1, which has zero chance of collisions since it takes date/time into account when generating the UUID (unless you are generating a great number of UUID's at the same time).

You can actually reverse the UUID1 value to retrieve the original epoch time that was used to generate it.

uuid4 generates a random ID that has a very small chance of colliding with a previously generated value, however since it doesn't use monotonically increasing epoch time as an input (or include it in the output uuid), a value that was previously generated has a (very) small chance of being generated again in the future.

Community
  • 1
  • 1
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159
  • 5
    uuid version 1 are guaranteed.unique only if you take an actual MAC address. Be careful if you use virtual machines who generaly have fake network cards . – Serge Ballesta Jun 01 '14 at 20:28
6

You should always have a duplicate check, even though the odds are pretty good, you can always have duplicates.

I would recommend just adding a duplicate key constraint in your database and in case of an error retry.

Wolph
  • 78,177
  • 11
  • 137
  • 148
6

As long as you create all uuids on same system, unless there is a very serious flaw in python implementation (what I really cannot imagine), RFC 4122 states that they will all be distinct (edited : if using version 1,3 or 5).

The only problem that could arise with UUID, were if two systems create UUID exactly at the same moment and :

  • use same MAC address on their network card (really uncommon) and you are using UUID version 1
  • or use same name and you are using UUID version 3 or 5
  • or got same random number and you are using UUID version 4 (*)

So if you have a real MAC address or use an official DNS name or a unique LDAP DN, you can take for true that the generated uuids will be globally unique.

So IMHO, you only have to check unicity if you want to prevent your application against a malicious attack trying to voluntaryly use an existant uuid.

EDIT: As stated by Martin Konecny, in uuid4 the timestamp part is random too and not monotonic. So the possibilily is collision is very limited but not 0.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252