Is there a way to get the primary keys of the items you have created using the bulk_create feature in django 1.4+?
-
I'd also love to know how people tackle this. I suppose you'd have to do something like lock the table, run the bulk_create, query for all new records and then unlock the table? It seems quite clear from the docs that bulk_create does not return the auto_increment keys, so the only way around this is for a convoluted work around. The other method I suppose would be to have another table which you use to keep track of used primary keys in order, so you allocate a block of IDs beforehand and then run the bulk_create and you should know the expected primary keys. I'm not happy with either idea :( – DanH Jun 07 '13 at 05:34
-
2There seems to be an effort to tackle this in django dev https://code.djangoproject.com/ticket/19527 – DanH Jun 07 '13 at 07:12
-
1Oh yeah! It appears my ~4yrs old proposal just melted into the stock Django 1.10 allowing all of us to enjoy. :-) Works I guess only for postgres for now. – Tuttle Aug 02 '16 at 08:45
-
It is now possible with Django 1.10 and PostgreSQl: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create – Maxime R. Aug 16 '16 at 16:07
-
hopefully there might be a support for mysql too – Shift 'n Tab Dec 23 '16 at 11:02
11 Answers
2016
Since Django 1.10 - it's now supported (on Postgres only) here is a link to the doc.
>>> list_of_objects = Entry.objects.bulk_create([
... Entry(headline="Django 2.0 Released"),
... Entry(headline="Django 2.1 Announced"),
... Entry(headline="Breaking: Django is awesome")
... ])
>>> list_of_objects[0].id
1
From the change log:
Changed in Django 1.10: Support for setting primary keys on objects created using bulk_create() when using PostgreSQL was added

- 4,185
- 2
- 26
- 35

- 13,142
- 6
- 60
- 65
-
2
-
5What if in mysql? Do the entries created by bulk_create have an id value in the database? – Mohammed Shareef C Nov 16 '17 at 07:06
-
2@MohammedShareefC It will get a primary key in the database, but the list returned by the `bulk_create` method is the same one you provided, and the local objects (members of that list) do not have it set as [pyriku demonstrates in his answer](https://stackoverflow.com/a/16981733/2547556). – Yushin Washio Feb 15 '19 at 14:49
-
2On databases that support it (all except PostgreSQL < 9.5 and Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it). – eugene May 25 '20 at 04:31
-
4
-
1I just noticed setting ignore_conflicts to True is the cause of this – The Voyager May 10 '21 at 08:22
-
4Just found this on the docs: On databases that support it (all but Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it). – The Voyager May 10 '21 at 08:25
-
Thank you so much, been looking for an answer for this problem for a good number of hours! – Kais Ben Daamech Jul 22 '21 at 09:04
-
1On PostgreSQL, update_conflicts=True is also returning None instead of pk – Abdul Halim Chowdhury Jan 28 '23 at 05:16
According to the documentation you can't do it: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create
bulk-create is just for that: create a lot of objects in an efficient way saving a lot of queries. But that means that the response you get is kind of incomplete. If you do:
>>> categories = Category.objects.bulk_create([
Category(titel="Python", user=user),
Category(titel="Django", user=user),
Category(titel="HTML5", user=user),
])
>>> [x.pk for x in categories]
[None, None, None]
That doesn't mean your categories doesn't have pk, just that the query didn't retrieve them (if the key is an AutoField
). If you want the pks for some reason you will need to save the objects in a classic way.

- 1,251
- 7
- 17
-
20I think that's the point of the question, or at least how I'd interpret it, i.e.: What techniques do people use to get around this limitation of `bulk_create`, in order to retrieve the created IDs reliably? – DanH Jun 08 '13 at 07:58
-
3There is an open PR to add support for returning IDs from bulk_create here: https://github.com/django/django/pull/5166 Notably Postgres supports returning IDs so there is a way to get IDs back immediately through a raw sql operation. – gordonc Dec 08 '15 at 17:41
Two approaches I can think of:
a) You could do
category_ids = Category.objects.values_list('id', flat=True)
categories = Category.objects.bulk_create([
Category(title="title1", user=user, created_at=now),
Category(title="title2", user=user, created_at=now),
Category(title="title3", user=user, created_at=now),
])
new_categories_ids = Category.objects.exclude(id__in=category_ids).values_list('id', flat=True)
This could be a little expensive if the queryset is extremely huge.
b) If the model has a created_at
field,
now = datetime.datetime.now()
categories = Category.objects.bulk_create([
Category(title="title1", user=user, created_at=now),
Category(title="title2", user=user, created_at=now),
Category(title="title3", user=user, created_at=now),
])
new_cats = Category.objects.filter(created_at >= now).values_list('id', flat=True)
This has the limitation of having a field that stores when the object was created.

- 97,368
- 26
- 197
- 188
-
3You know, I already have a `date_created` field, so this could work although it's minimal effort to add one in anyway. My only concern is that multiple queries could hit the DB at the same time, so I suppose I need to implement some sort of locking mechanism to before the `bulk_create` and after the `created_at` query. – DanH Jun 09 '13 at 06:27
-
1Yeah, atomic transactions could be used to ensure race conditions are avoided. – karthikr Jun 09 '13 at 14:27
-
Regarding the first approach, in Django 1.10, values_list('id', flat=True) returns a queryset, which seems to be evaluated after bulk_create is called - wrapping category_ids in list() to force database query, helps. – George Sep 12 '16 at 08:55
-
-
1@deathangel908 Don't do `max(id)`, I tried it and ran into problems. The MariaDB documentation explicitly states not to assume anything else about the PK other than uniqueness. – Patrick Sep 17 '20 at 08:18
-
Actually my colleague has suggested the following solution which seems all so obvious now. Add a new column called bulk_ref
which you populate with a unique value and insert for every row. Afterwards simply query the table with the bulk_ref
set beforehand and voila, your inserted records are retrieved. e.g.:
cars = [Car(
model="Ford",
color="Blue",
price="5000",
bulk_ref=5,
),Car(
model="Honda",
color="Silver",
price="6000",
bulk_ref=5,
)]
Car.objects.bulk_create(cars)
qs = Car.objects.filter(bulk_ref=5)

- 5,498
- 4
- 49
- 72
-
20It's not a good practice to add additional fields to your model to work around query problems. – max Jul 19 '15 at 18:50
-
2While this is true, bulk inserts should be considered an optimization anyways, one which may necessarily compromise the design. There's a tension between "not fast enough" and "not a perfect design" to be balanced here. Until the Django PR 5166 goes in, this is probably a reasonable compromise for teams that need the optimization of a bulk insert. – Scott A Jan 22 '16 at 00:33
-
if the bulk create is called multiple times at different times in the application then we need to update bulk_ref every time for which we'll need a statis variable ref – varun Apr 15 '16 at 10:59
-
@varun I can't remember how we actually ended up implementing this, the bulk_ref might have been a UUID or similar random number. Doesn't need to be sequential or relative to other bulk_refs – DanH Apr 15 '16 at 11:37
-
@DanH then we'll have to check every time whether a random number or uuid already exists in database – varun Apr 15 '16 at 14:06
-
+1 This solution already worked for me because I already had a way to query the inserted values uniquely. – Justin Meiners Jul 22 '16 at 00:57
-
1@varun [Thus, for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.](https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions) – DylanYoung Aug 09 '17 at 17:37
-
1@DanH seems like as reasonable choice to avoid queries and adding an extra field for this purpose might actually be very helpful. – varun Aug 11 '17 at 06:28
I will share you AUTO_INCREMENT
handling in InnoDB
(MySQL)
and approach to get primary key when bulk_create
(Django)
According to bulk_create doc If the model’s primary key is an AutoField it does not retrieve and set the primary key attribute, as save() does, unless the database backend supports it (currently PostgreSQL).
so we need to find out the cause of the problem in Django or MySQL before looking for a solution.
The AUTO FIELD
in Django is actually AUTO_INCREMENT
in MySQL. It used to generate a unique identity for new rows (ref)
You want to bulk_create
objects (Django) means insert multiple rows in a single SQL query
. But how you can retrieve the most recent automatically generated PK (primary key)? Thanks to LAST_INSERT_ID. It returns first value automatically generated of the most recently executed INSERT statement...This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
I encourage you to read AUTO_INCREMENT Handling in InnoDB and read Django code django.db.models.query.QuerySet.bulk_create
to know why Django not support it for MySQl yet. It's interesting. Come back here and comment your idea please.
Next, I will show you sample code:
from django.db import connections, models, transaction
from django.db.models import AutoField, sql
def dict_fetch_all(cursor):
"""Return all rows from a cursor as a dict"""
columns = [col[0] for col in cursor.description]
return [
dict(zip(columns, row))
for row in cursor.fetchall()
]
class BulkQueryManager(models.Manager):
def bulk_create_return_with_id(self, objs, batch_size=2000):
self._for_write = True
fields = [f for f in self.model._meta.concrete_fields if not isinstance(f, AutoField)]
created_objs = []
with transaction.atomic(using=self.db):
with connections[self.db].cursor() as cursor:
for item in [objs[i:i + batch_size] for i in range(0, len(objs), batch_size)]:
query = sql.InsertQuery(self.model)
query.insert_values(fields, item)
for raw_sql, params in query.get_compiler(using=self.db).as_sql():
cursor.execute(raw_sql, params)
raw = "SELECT * FROM %s WHERE id >= %s ORDER BY id DESC LIMIT %s" % (
self.model._meta.db_table, cursor.lastrowid, cursor.rowcount
)
cursor.execute(raw)
created_objs.extend(dict_fetch_all(cursor))
return created_objs
class BookTab(models.Model):
name = models.CharField(max_length=128)
bulk_query_manager = BulkQueryManager()
class Meta:
db_table = 'book_tab'
def test():
x = [BookTab(name="1"), BookTab(name="2")]
create_books = BookTab.bulk_query_manager.bulk_create_return_with_id(x)
print(create_books) # [{'id': 2, 'name': '2'}, {'id': 1, 'name': '1'}]
The idea is using cursor
to execute raw insert sql
and then get back created_records. According to AUTO_INCREMENT handling in InnoDB
, it make sure that there will be no records interrupting your objs
from PK cursor.lastrowid - len(objs) + 1 to cursor.lastrowid
(cursor.lastrowid).
Bonus: It's running production in my company. But you need to care about size affect
that why Django not support it.

- 366
- 1
- 5
-
Where can I find the explenation on why Django does not support bulk_create and what do you mean by size affect? – Mathijs Feb 08 '22 at 10:15
-
I have 140m+ rows in my table and 1m rows are added each day. Could I use this implementation without problems? – Mathijs Feb 14 '22 at 21:00
-
I have tried this solution and it doesn't always work. Looks like the lastrowid is fine, but the wrong created objects get returned sometimes. Eg. lastrowid = 10 and limit is 20, i get back rows with id 12 to 22 instead of 10 to 20. (It happens very rarely) – Mathijs Feb 21 '22 at 17:47
# datatime.py
# my datatime function
def getTimeStamp(needFormat=0, formatMS=True):
if needFormat != 0:
return datetime.datetime.now().strftime(f'%Y-%m-%d %H:%M:%S{r".%f" if formatMS else ""}')
else:
ft = time.time()
return (ft if formatMS else int(ft))
def getTimeStampString():
return str(getTimeStamp()).replace('.', '')
# model
bulk_marker = models.CharField(max_length=32, blank=True, null=True, verbose_name='bulk_marker', help_text='ONLYFOR_bulkCreate')
# views
import .........getTimeStampString
data_list(
Category(title="title1", bulk_marker=getTimeStampString()),
...
)
# bulk_create
Category.objects.bulk_create(data_list)
# Get primary Key id
Category.objects.filter(bulk_marker=bulk_marker).values_list('id', flat=True)

- 303
- 3
- 9
I have tried many strategies to get around this limitation of MariaDB/MySQL. The only reliable solution I came up with at the end was to generate the primary keys in the application. DO NOT generate INT AUTO_INCREMENT
PK fields yourself, it won't work, not even in a transaction with isolation level serializable
, because the PK counter in MariaDB is not protected by transaction locks.
The solution is to add unique UUID
fields to the models, generate their values in the model class, and then use that as their identifier. When you save a bunch of models to the database, you still won't get back their actual PK but that's fine, because in subsequent queries you can uniquely identify them with their UUID.

- 999
- 1
- 9
- 21
The django documentation currently states under the limitations:
If the model’s primary key is an AutoField it does not retrieve and set the primary key attribute, as
save()
does.
But, there is good news. There has been a couple of tickets talking about bulk_create
from memory. The ticket listed above is the most likely to have a solution which will soon be implemented but obviously there is no guarantee on time or if it will ever make it.
So there are two possible solutions,
Wait and see if this patch makes it to production. You can help with this by testing out the stated solution and let the django community know your thoughts / issues. https://code.djangoproject.com/attachment/ticket/19527/bulk_create_and_create_schema_django_v1.5.1.patch
Override / write your own bulk insert solution.

- 8,880
- 7
- 60
- 101
Probably the simplest workaround is manually assigning primary keys. It depends on particular case, but sometimes it's enough to start with max(id)+1 from table and assign numbers incrementing on every object. However if several clients may insert records simultaneously some lock may be needed.

- 3,111
- 23
- 35
This doesn't work in stock Django, but there is a patch in the Django bug tracker that makes bulk_create set the primary keys for created objects.

- 7,041
- 11
- 44
- 67
The approach suggested by @Or Duan works for PostgreSQL when using bulk_create
with ignore_conflicts=False
. When ignore_conflicts=True
is set then you don't get the values for the AutoField
(usually ID) in the returned objects.

- 1
- 1