117

For a number of reasons^, I'd like to use a UUID as a primary key in some of my Django models. If I do so, will I still be able to use outside apps like "contrib.comments", "django-voting" or "django-tagging" which use generic relations via ContentType?

Using "django-voting" as an example, the Vote model looks like this:

class Vote(models.Model):
    user         = models.ForeignKey(User)
    content_type = models.ForeignKey(ContentType)
    object_id    = models.PositiveIntegerField()
    object       = generic.GenericForeignKey('content_type', 'object_id')
    vote         = models.SmallIntegerField(choices=SCORES)

This app seems to be assuming that the primary key for the model being voted on is an integer.

The built-in comments app seems to be capable of handling non-integer PKs, though:

class BaseCommentAbstractModel(models.Model):
    content_type   = models.ForeignKey(ContentType,
            verbose_name=_('content type'),
            related_name="content_type_set_for_%(class)s")
    object_pk      = models.TextField(_('object ID'))
    content_object = generic.GenericForeignKey(ct_field="content_type", fk_field="object_pk")

Is this "integer-PK-assumed" problem a common situation for third-party apps which would make using UUIDs a pain? Or, possibly, am I misreading this situation?

Is there a way to use UUIDs as primary keys in Django without causing too much trouble?


^ Some of the reasons: hiding object counts, preventing url "id crawling", using multiple servers to create non-conflicting objects, ...
mitchf
  • 3,697
  • 4
  • 26
  • 29

6 Answers6

285

As seen in the documentation, from Django 1.8 there is a built in UUID field. The performance differences when using a UUID vs integer are negligible.

import uuid
from django.db import models

class MyUUIDModel(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)

You can also check this answer for more information.

pyjavo
  • 1,598
  • 2
  • 23
  • 41
keithhackbarth
  • 9,317
  • 5
  • 28
  • 33
  • @Keithhackbarth how do we set django to use this every time when automatically creating IDs for tables? – fIwJlxSzApHEZIl Jan 26 '17 at 20:56
  • 4
    @anon58192932 Not really clear what exactly do you mean by "every time". If you want UUIDs to be used for every model, create your own abstract base model and use it instead of django.models.Model. – Назар Топольський Mar 01 '17 at 16:31
  • 10
    Performance differences are only negligible when underlying database supports the UUID type. Django still uses a charfield for most DBs (postgresql is the only documented db to support the UUID field). – NirIzr Jun 06 '18 at 19:30
  • 4
    I am confused why this is a popular answer... The question was asking about difficulty with third party packages. Despite Django natively supporting UUID, there still seems to be a number of packages which don't account for UUIDs. In my experience, it is a pain. – ambe5960 Jan 06 '20 at 20:22
  • Note that this doesn't work for switching primary keys for an existing model to UUID. – infiniteloop Dec 29 '20 at 14:34
73

A UUID primary key will cause problems not only with generic relations, but with efficiency in general: every foreign key will be significantly more expensive—both to store, and to join on—than a machine word.

However, nothing requires the UUID to be the primary key: just make it a secondary key, by supplementing your model with a uuid field with unique=True. Use the implicit primary key as normal (internal to your system), and use the UUID as your external identifier.

Pi Delport
  • 10,356
  • 3
  • 36
  • 50
  • In addition, you can override `save` and generate your UUID there when an object is being saved the first time (by checking if the object has a primary key). – Joe Holloway Oct 15 '10 at 16:44
  • 20
    Joe Holloway, no need for that: you can simply supply the UUID generation function as the field's `default`. – Pi Delport Oct 15 '10 at 16:54
  • 1
    Thanks Piet. Your solution is what I'm doing now and it works for obscuring the primary key in the URI (although the comment app still shows it in a hidden field in the "create comment" form). Doesn't give me the advantage of being able to easily create non-colliding database rows on separate servers though. Oh well, I guess I'll learn to re-love the integer primary key. – mitchf Oct 15 '10 at 17:55
  • 4
    Joe: I use django_extensions.db.fields.UUIDField to create my UUIDs in my model. It's simple, I just define my field like this: user_uuid = UUIDField() – mitchf Oct 15 '10 at 17:55
  • One (annoying) side effect of doing this, and having default=uuid.uuid4 (or equivalent) is that, if you are using south for migrations, then you will need to edit your migration file each time you have a new migration, and remove the default for this field. – Matthew Schinckel Mar 04 '12 at 07:30
  • 3
    @MatthewSchinckel: When you use `django_extensions.db.fields.UUIDField` as mentioned by mitchf, you will have no problems with Django-South migrations - field mentioned by him has built-in support for South migrations. – Tadeck Apr 18 '12 at 10:10
  • 175
    Terrible answer. Postgres has native (128 bit) UUIDs which are only 2 words on a 64 bit machine, so would not be "significantly more expensive" than native 64 bit INT. – postfuturist Apr 29 '13 at 20:47
  • 2
    postfuturist: PostgreSQL's UUID type is (currently, at least) implemented as a C char array, with comparisons using memcmp(): this is indeed going to be significantly more expensive than a machine word comparison, in general. Besides that, you must consider locality: if your UUIDs are uniformly distributed, instead of sequential, your index performance can go down the drain, depending on your workload. (This is especially important for index-clustered backends, such as MySQL InnoDB.) – Pi Delport Apr 30 '13 at 05:51
  • 9
    Piet, given that it has a btree index on it, how many comparisons are there going to be on a given query? Not many. Also, I'm sure that the memcmp call will be aligned and optimized on most OSs. Based on the nature of the questions, I would say *not* using UUID because of possible (likely negligible) performance differences is the wrong optimization. – postfuturist May 02 '13 at 20:46
  • 2
    It depends, of course: that's the nature of optimization. Regarding the index, it's not the depth of the tree or the number of comparisons that are important, but the key distribution: a random distribution will essentially require that all pages fit in working memory (and make index updates more costly), while sequential keys will tend to follow the data's natural clustering. This may (or may not) have a huge impact on performance, but either way, it's something to be aware of. – Pi Delport May 03 '13 at 06:21
  • 1
    It's hard to find good references about this topic, but for example, [here](http://www.informit.com/articles/printerfriendly.aspx?p=25862) is an old article comparing randomly-distributed GUIDs to modified, serialized GUIDs on MS SQL. With a benchmark that inserts 500,000 orders, the random GUIDs took around 30 times longer than the serialized GUIDs, which took about the same time as integer primary keys. Likewise, the serialized GUIDs only consumed about 1MB of memory per 500 orders, while the random GUIDs saturated the test server's 350MB of memory for the same number of orders. – Pi Delport May 03 '13 at 06:42
  • 2
    You are over-arching, generalizing. It depends on database implementation under the hood. And I know one implementation that compare string more efficiently than machine word. – Daniel Baktiar Nov 16 '13 at 15:17
  • The only reason I want to use a uuid here is because I know that the int or a bigint primary key will be too small for me as the table is going to fill up really fast. My table would be very large. Am I over engineering this? – Rishav Feb 08 '22 at 09:36
31

The real problem with UUID as a PK is the disk fragmentation and insert degradation associated with non-numeric identiifers. Because the PK is a clustered index (in virtually every RDBMS except PostgreSQL), when it's not auto-incremented, your DB engine will have to resort your physical drive when inserting a row with an id of lower ordinality, which will happen all the time with UUIDs. When you get lots of data in your DB, it may take many seconds or even minutes just to insert one new record. And your disk will eventually become fragmented, requiring periodic disk defragmentation. This is all really bad.

To solve for these, I recently came up with the following architecture that I thought would be worth sharing.

The UUID Pseudo-Primary-Key

This method allows you to leverage the benefits of a UUID as a Primary Key (using a unique index UUID), while maintaining an auto-incremented PK to address the fragmentation and insert performance degredation concerns of having a non-numeric PK.

How it works:

  1. Create an auto-incremented primary key called pkid on your DB Models.
  2. Add a unique-indexed UUID id field to allow you to search by a UUID id, instead of a numeric primary key.
  3. Point the ForeignKey to the UUID (using to_field='id') to allow your foreign-keys to properly represent the Pseudo-PK instead of the numeric ID.

Essentially, you will do the following:

First, create an abstract Django Base Model

class UUIDModel(models.Model):
    pkid = models.BigAutoField(primary_key=True, editable=False)
    id = models.UUIDField(default=uuid.uuid4, editable=False, unique=True)

    class Meta:
        abstract = True

Make sure to extend the base model instead of models.Model

class Site(UUIDModel):
    name = models.CharField(max_length=255)

Also make sure your ForeignKeys point to the UUID id field instead of the auto-incremented pkid field:

class Page(UUIDModel):
    site = models.ForeignKey(Site, to_field='id', on_delete=models.CASCADE)

If you're using Django Rest Framework (DRF), make sure to also create a Base ViewSet class to set the default search field:

class UUIDModelViewSet(viewsets.ModelViewSet):
    lookup_field = 'id' 

And extend that instead of the base ModelViewSet for your API views:

class SiteViewSet(UUIDModelViewSet):
    model = Site

class PageViewSet(UUIDModelViewSet):
    model = Page

More notes on the why and the how in this article: https://www.stevenmoseley.com/blog/uuid-primary-keys-django-rest-framework-2-steps

Steven Moseley
  • 15,871
  • 4
  • 39
  • 50
  • 5
    This is incorrect. Postgres does not order the rows on disk by primary key. Tables are written in chunks, when a row is added or updated it is placed at the end of the last chunk. – Nicholas E. Jul 19 '21 at 15:22
  • 1
    You have en error on your blog post. You forgot the class Meta: abstract=True and the solution does not work @steven-moseley https://www.stevenmoseley.com/blog/tech/uuid-primary-keys-django-rest-framework-2-steps – Babak Bandpey Dec 13 '22 at 12:58
  • The solution works fine when the instructions are followed correctly. I've used it in numerous apps. – Steven Moseley Jan 10 '23 at 21:55
  • All the points you bring up are currently correct, but might be done away with whenever https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/ lands and uuid variant 7 becomes an option. – plunker Mar 14 '23 at 21:46
13

I ran into a similar situation and found out in the official Django documentation, that the object_id doesn't have to be of the same type as the primary_key of the related model. For example, if you want your generic relationship to be valid for both IntegerField and CharField id's, just set your object_id to be a CharField. Since integers can coerce into strings it'll be fine. Same goes for UUIDField.

Example:

class Vote(models.Model):
    user         = models.ForeignKey(User)
    content_type = models.ForeignKey(ContentType)
    object_id    = models.CharField(max_length=50) # <<-- This line was modified 
    object       = generic.GenericForeignKey('content_type', 'object_id')
    vote         = models.SmallIntegerField(choices=SCORES)
Jordi
  • 886
  • 11
  • 11
7

this can be done by using a custom base abstract model,using the following steps.

First create a folder in your project call it basemodel then add a abstractmodelbase.py with the following below:

from django.db import models
import uuid


class BaseAbstractModel(models.Model):

    """
     This model defines base models that implements common fields like:
     created_at
     updated_at
     is_deleted
    """
    id = models.UUIDField(primary_key=True, unique=True, default=uuid.uuid4, editable=False)
    created_at = models.DateTimeField(auto_now_add=True, editable=False)
    updated_at = models.DateTimeField(auto_now=True, editable=False)
    is_deleted = models.BooleanField(default=False)

    def soft_delete(self):
        """soft  delete a model instance"""
        self.is_deleted=True
        self.save()

    class Meta:
        abstract = True
        ordering = ['-created_at']

second: in all your model file for each app do this

from django.db import models
from basemodel import BaseAbstractModel
import uuid

# Create your models here.

class Incident(BaseAbstractModel):

    """ Incident model  """

    place = models.CharField(max_length=50, blank=False, null=False)
    personal_number = models.CharField(max_length=12, blank=False, null=False)
    description = models.TextField(max_length=500, blank=False, null=False)
    action = models.TextField(max_length=500, blank=True, null=True)
    image = models.ImageField(upload_to='images/', blank=True, null=True)
    incident_date = models.DateTimeField(blank=False, null=False) 

So the above model incident inherent all the field in baseabstract model.

Anatol
  • 3,720
  • 2
  • 20
  • 40
-1

The question can be rephrased as "is there a way to get Django to use a UUID for all database ids in all tables instead of an auto-incremented integer?".

Sure, I can do:

id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)

in all of my tables, but I can't find a way to do this for:

  1. 3rd party modules
  2. Django generated ManyToMany tables

So, this appears to be a missing Django feature.

EMS
  • 1,033
  • 1
  • 9
  • 11