1

I have a Story model with a M2M relationship to some Resource objects. Some of the Resource objects are missing a name so I want to copy the title of the Story to the assigned Resource objects.

Here is my code:

from collector import models
from django.core.paginator import Paginator

paginator = Paginator(models.Story.objects.all(), 1000)

def fix_issues():
    for page in range(1, paginator.num_pages + 1):
        for story in paginator.page(page).object_list:
            name_story = story.title
            for r in story.resources.select_subclasses():
                if r.name != name_story:
                    r.name = name_story
                    r.save()
                    if len(r.name) == 0:
                        print("Something went wrong: " + name_story)
        print("done processing page %s out of %s" % (page, paginator.num_pages))

fix_issues()

I need to use a paginator because I'm dealing with a million objects. The weird part is that after calling fix_issues() about half of my resources that had no name, now have the correct name, while the other half still has no name. I can call fix_issues() again and again and every time more objects receive a name. This seems really weird to me, why would an object not be updated the first time but only the second time?

Additional information:

  • The "Something went wrong: " message is never printed.
  • I'm using select_subclasses from django-model-utils to iterate over all resources (any type).
  • The story.title is never empty.
  • No error message is printed, when I run these commands.
  • I did not override the save method of the Resource model (only the save method of the Story model).
  • I tried to use @transaction.atomic but the result was the same.

My Model:

class Resource(models.Model):
    name = models.CharField(max_length=200)
    # Important for retrieving the correct subtype.
    objects = InheritanceManager()

    def __str__(self):
        return str(self.name)


class CustomResource(Resource):
    homepage = models.CharField(max_length=3000, default="", blank=True, null=True)


class Story(models.Model):
    url = models.URLField(max_length=3000)
    resources = models.ManyToManyField(Resource)
    popularity = models.FloatField()

    def _update_popularity(self):
        self.popularity = 3

    def save(self, *args, **kwargs):
        super(Story, self).save(*args, **kwargs)
        self._update_popularity()
        super(Story, self).save(*args, **kwargs)

Documentation for the select_subclasses: http://django-model-utils.readthedocs.io/en/latest/managers.html#inheritancemanager

Further investigation: I thought that maybe select_subclasses did not return all the objects. Right now every story has exactly one resource. So it was easy enough to check that select_subclasses always returns one item. This is the function I used:

def find_issues():
    for page in range(1, paginator.num_pages + 1):
        for story in paginator.page(page).object_list:
            assert(len(story.resources.select_subclasses()) == 1)
        print("done processing page %s out of %s" % (page, paginator.num_pages))

But again, this executes without any problems. So I don't thing the select_subclasses is to blame. I also checked if paginator.num_pages is right and it is. If i divide by 1000 (items per page) I get exactly the number of stories I have in my database.

user667804
  • 740
  • 6
  • 25
  • You'll need to show more details of your models. Does Story have any default ordering defined? What does `select_subclasses` do? Paginators aren't really the right tool here anyway; you might want to investigate the queryset [`iterator()`](https://docs.djangoproject.com/en/1.10/ref/models/querysets/#iterator) method. And apart from anything else, there's probably a much more efficient way of doing this, eg via `update()`. – Daniel Roseman Nov 20 '16 at 12:28
  • I added more code and a link to the library I'm using. Finding a workaround is not that important for me, I would rather understand why this approach fails. But thanks for the hint, will read up on update(). – user667804 Nov 20 '16 at 12:36
  • I'm not saying this is the issue (or an issue) 100%, but one thing that jumps up from your code is calling `super().save()` twice in the overridden save method. You're doing it with the same args and kwargs, so if the first `save()` pops sth from those, then you call save again with new params. Bad tactics either way, I suggest you call it only once - either before or after updating popularity. Also if you're not familiar with it, then M2M relations are not saved within the `save()` method but there's another separate method for that, look for it in the Django docs. – wanaryytel Nov 20 '16 at 14:35
  • But I never call Story.save() here. I only call the save method on my Resource object. Nevertheless, the reason I call it twice is that the update_popularity method computes something more complex (I removed the code here). If I would not save before I would get an error. Will check if there is a nicer way. But the main issue is, unfortunately, still not resolved. – user667804 Nov 20 '16 at 14:53

1 Answers1

0

I think I know what is happening:

The Paginator loads a Queryset and gives me the first n items. I process these and update some of the values. But for the next iteration the order of the items in the queryset changes (because I updated some of them and did not define an order). So I'm skipping over items that are now on the first page. I can avoid it by specifying an order (pk for example).

If you think I'm wrong, please let me know. Otherwise I will accept this as the correct answer. Thank you.

user667804
  • 740
  • 6
  • 25