4

So I have a queryset to update

stories = Story.objects.filter(introtext="")
for story in stories:
    #just set it to the first 'sentence'
    story.introtext = story.content[0:(story.content.find('.'))] + ".</p>" 
    story.save()

And the save() operation completely kills performance. And in the process list, there are multiple entries for "./manage.py shell" yes I ran this through django shell.

However, in the past I've ran scripts that didn't need to use save(), as it was changing a many to many field. These scripts were very performant. My project has this code, which could be relevant to why these scripts were so good.

@receiver(signals.m2m_changed, sender=Story.tags.through)
def save_story(sender, instance, action, reverse, model, pk_set, **kwargs):
    instance.save()

What is the best way to update a large queryset (10000+) efficiently?

straykiwi
  • 538
  • 6
  • 23

2 Answers2

7

As far as new introtext value depends on content field of the object you can't do any bulk update. But you can speed up saving list of individual objects by wrapping it into transaction:

from django.db import transaction

with  transaction.atomic():
    stories = Story.objects.filter(introtext='')
    for story in stories:
        introtext = story.content[0:(story.content.find('.'))] + ".</p>" 
        Story.objects.filter(pk=story.pk).update(introtext=introtext)

transaction.atomic() will increase speed by order of magnitude.

filter(pk=story.pk).update() trick allows you to prevent any pre_save/post_save signals which are emitted in case of the simple save(). This is the officially recommended method of updating single field of the object.

catavaran
  • 44,703
  • 8
  • 98
  • 85
  • Should I change that last filter to a get? – straykiwi Jan 20 '15 at 03:45
  • No, you souldn't. Follow the documentation link in my answer and read from: "_If you’re just updating a record and don’t need to do anything with the model object, the most efficient approach is to call update(), rather than loading the model object into memory._" – catavaran Jan 20 '15 at 03:52
  • This was INCREDIBLY faster. It never occurred to me to wrap up into one single transaction. Thank you – straykiwi Jan 20 '15 at 19:51
  • @catavaran RE: `transaction.atomic() will increase speed by order of magnitude.` where can I read about this? Couldn't find info on the speed increase in the official documentation. – Murat Jumashev Feb 15 '21 at 06:42
  • @MuratJumashev For example https://www.postgresql.org/docs/12/populate.html#DISABLE-AUTOCOMMIT and https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-transaction-management.html – catavaran Feb 16 '21 at 07:25
0

You can use update built-in function over a queryset

Exmaple:

MyModel.objects.all().update(color=red)

In your case, you need use F() (read more here) built-in function to use instance own attributes:

from django.db.models import F

stories = Story.objects.filter(introtext__exact='')
stories.update(F('introtext')[0:F('content').find(.)] + ".</p>" )
straykiwi
  • 538
  • 6
  • 23
levi
  • 22,001
  • 7
  • 73
  • 74
  • But the field I want to update will be different between instances. It is not static across all. The intro text will take the first sentence of that instance content. I've also tried using save(update_fields=["introtext"]) and it was just as bad performance wise – straykiwi Jan 20 '15 at 03:07
  • AttributeError: 'F' object has no attribute 'find' – straykiwi Jan 20 '15 at 03:23