Update multiple objects at once in Django?

Question

I am using Django 1.9. I have a Django table that represents the value of a particular measure, by organisation by month, with raw values and percentiles:

class MeasureValue(models.Model):
    org = models.ForeignKey(Org, null=True, blank=True)
    month = models.DateField()
    calc_value = models.FloatField(null=True, blank=True)
    percentile = models.FloatField(null=True, blank=True)

There are typically 10,000 or so per month. My question is about whether I can speed up the process of setting values on the models.

Currently, I calculate percentiles by retrieving all the measurevalues for a month using a Django filter query, converting it to a pandas dataframe, and then using scipy's rankdata to set ranks and percentiles. I do this because pandas and rankdata are efficient, able to ignore null values, and able to handle repeated values in the way that I want, so I'm happy with this method:

records = MeasureValue.objects.filter(month=month).values()
df = pd.DataFrame.from_records(records)
// use calc_value to set percentile on each row, using scipy's rankdata

However, I then need to retrieve each percentile value from the dataframe, and set it back onto the model instances. Right now I do this by iterating over the dataframe's rows, and updating each instance:

for i, row in df.iterrows():
    mv = MeasureValue.objects.get(org=row.org, month=month)
    if (row.percentile is None) or np.isnan(row.percentile):
        row.percentile = None
    mv.percentile = row.percentile
    mv.save()

This is unsurprisingly quite slow. Is there any efficient Django way to speed it up, by making a single database write rather than tens of thousands? I have checked the documentation, but can't see one.

Have you looked at [`QuerySet.update()`](https://docs.djangoproject.com/en/1.9/ref/models/querysets/#update)? — Two-Bit Alchemist, Apr 20 '16 at 17:50
@Two-BitAlchemist I'm not sure it helps, given that I need to set a different value on every row...? — Richard, Apr 20 '16 at 17:59

ahmed · Accepted Answer · 2016-04-21T16:06:09.740

53

Atomic transactions can reduce the time spent in the loop:

from django.db import transaction

with transaction.atomic():
    for i, row in df.iterrows():
        mv = MeasureValue.objects.get(org=row.org, month=month)

        if (row.percentile is None) or np.isnan(row.percentile): 
            # if it's already None, why set it to None?
            row.percentile = None

        mv.percentile = row.percentile
        mv.save()

Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is actives.

By using with transaction.atomic() all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.

edited Apr 21 '16 at 16:06

answered Apr 20 '16 at 17:53

ahmed

5,430
1
20
36

Thanks. Why does this reduce the time? The documentation isn't helping https://docs.djangoproject.com/en/1.9/topics/db/transactions/#controlling-transactions-explicitly – Richard Apr 21 '16 at 15:11
2

Because the queries are combined into a single large query, that is only executed once. Lets say saving to the database takes 10 ms, if you save 100 objects running autocommit, that takes 10 ms * 100, but if you use a transaction it will only take 10 ms * 1 (rougly) – Peter F Jun 17 '20 at 09:57
very help full answer. actually I was looking for something else but got very good alternative – kiran kumar Aug 31 '20 at 13:41

Eugene Yarmash · Answer 2 · 2019-04-23T14:55:20.063

35

As of Django 2.2, you can use the bulk_update() queryset method to efficiently update the given fields on the provided model instances, generally with one query:

objs = [
    Entry.objects.create(headline='Entry 1'),
    Entry.objects.create(headline='Entry 2'),
]
objs[0].headline = 'This is entry 1'
objs[1].headline = 'This is entry 2'
Entry.objects.bulk_update(objs, ['headline'])

In older versions of Django you could use update() with Case/When, e.g.:

from django.db.models import Case, When

Entry.objects.filter(
    pk__in=headlines  # `headlines` is a pk -> headline mapping
).update(
    headline=Case(*[When(pk=entry_pk, then=headline)
                    for entry_pk, headline in headlines.items()]))

edited Apr 23 '19 at 14:55

answered Apr 22 '19 at 16:00

Eugene Yarmash

142,882
41
325
378

1

Thanks for putting the older version in there. I don't have access to `bulk_update` so getting clarity on the `update` call was reassuring. – NeilG Jun 09 '21 at 02:35
Is `Case`, `When` faster than `transactions` ? – Ahtisham Feb 04 '22 at 15:52
I found more detail on the behind the scenes of this here https://www.reddit.com/r/django/comments/mjpbij/running_a_bulk_update_efficiently_with_django/ – Shlomo Sep 04 '22 at 12:08

score 1 · Answer 3 · edited May 29 '23 at 14:20

1

In my case, we need Value

headlines is a pk -> headline mapping `{1, 'some_val1', 2, 'some_val2', ...}

from django.db.models import Case, When, Value

Entry.objects.filter(
pk__in=headlines  
).update(
headline=Case(*[When(pk=entry_pk, then=Value(headline))
                for entry_pk, headline in headlines.items()]))

edited May 29 '23 at 14:20

cconsta1

737
1
6
20

answered May 26 '23 at 17:38

fromSelect

23
4

score 0 · Answer 4 · answered Jun 09 '21 at 03:12

Actually, attempting @Eugene Yarmash 's answer I found I got this error:

FieldError: Joined field references are not permitted in this query

But I believe iterating update is still quicker than multiple saves, and I expect using a transaction should also expedite.

So, for versions of Django that don't offer bulk_update, assuming the same data used in Eugene's answer, where headlines is a pk -> headline mapping:

from django.db import transaction

with transaction.atomic():
    for entry_pk, headline in headlines.items():
        Entry.objects.filter(pk=entry_pk).update(headline=headline)

Update multiple objects at once in Django?

4 Answers4

Linked