Filtering Django QuerySet based on ManyToMany relationship existence

Question

The Models

Here is the basic models setup we have.

A List has many Items, and an Item can be in many Lists. For a given Item, if any of its Lists are good (i.e., list.bad == False) then the Item is good. If an Item doesn't appear in any good Lists, then it is bad.

We have a custom QuerySet for Items, with a method for returning only good Items, and a method for returning only bad Items.

class Item(models.Model):
    objects = ItemQuerySet.as_manager()
    name = models.CharField(max_length=255, unique=True)

class List(models.Model):
    name = models.CharField(max_length=255, unique=True)
    bad = models.BooleanField(default=True)
    items = models.ManyToManyField(Item, related_name='lists')

class ItemQuerySet(models.QuerySet):
    def bad(self):
        return self.exclude(lists__bad=False)

    def good(self):
         return self.filter(lists__bad=False)

The Scenario

Here's an example of a scenario we're having trouble with: one bad List, one good List, and two Items.

BadList:    GoodList:
- Item1     - Item1
- Item2

Since Item1 appears in at least one good list, it should come up in Item.objects.good(), and not in Item.objects.bad().

Since Item2 does not appear in any good list, it should come up in Item.objects.bad(), and not in Item.objects.good().

We can set up the scenario like so:

# Create the two lists.
>>> goodlist = List.objects.create(name='goodlist', bad=False)
>>> badlist = List.objects.create(name='badlist', bad=True)

# Create the two items.
>>> item1 = Item.objects.create(name='item1')
>>> item2 = Item.objects.create(name='item2')

# Item1 goes in both lists
>>> goodlist.items.add(item1)
>>> badlist.items.add(item1)

# Item2 only in badlist
>>> badlist.items.add(item2)

And, indeed, Item.objects.good() and Item.objects.bad() work as we expect:

>>> Item.objects.bad() # This returns what we want! Good!
<QuerySet [<Item: item2>]>

>>> Item.objects.good() # This returns what we want! Good!
<QuerySet [<Item: item1>]>

The Problem

Thanks for bearing with me. Here's where our custom QuerySet goes wrong. If we access the good() and bad() custom QuerySet methods through a single List's Items, we get incorrect results.

>>> badlist.items.bad() # WRONG! We want to ONLY see item2 here!
<QuerySet [<Item: item1>, <Item: item2>]

>>> badlist.items.good() # WRONG! We want to see item1 here!
<QuerySet []>

It seems like, when we do badlist.items.bad(), the query only considers badlist in determining if the Items are bad, instead of considering all Lists that the Items are in. But I'm confused as to why that would be the case.

My thought is that, in the ItemQuerySet.bad method, I want something like self.exclude(any__lists__bad=False) instead of just self.exclude(lists__bad=False). But of course that any__ keyword doesn't actually exist, and I'm not sure how to express that logic correctly in a Django QuerySet. It seems that using Q objects might be the way forward, but I'm still not quite sure how to express a query like this with Q objects.

In our actual database, there are less than 100 Lists, but millions of Items. So, for performance reasons it's ideal to do this with one query, rather than a property or multiple queries.

Cheers!

Brobin · Answer 1 · 2017-06-13T18:52:54.480

1

If you print out the query generated by badlist.items.bad() you'll see the issue: It will use a WHERE clause on the through table, thus limiting the lists to only badlist. You need to start from the Item level if you want to apply bad and good correctly, then filter by the items in the list.

item_ids = list(badlist.items.values_list('id'), flat=True)

Item.objects.bad().filter(id__in=item_ids)

Item.objects.good().filter(id__in=item_ids)

Edit: I can't test this without the schema, but I think you can use annotations to count up the number of lists and then filter via that

def annotate_good(self);
    return self.annotate(good=Count(Case(When(lists__bad=False, then=1), default=0)))

def good(self):
    return self.annotate_good().exclude(good=0)

def bad(self):
    return self.annotate_good().filter(good=0)

Otherwise, if performance really is an issue, I would add a good or bad field to the Item model and update it when saving so that the querying becomes dead simple.

edited Jun 13 '17 at 18:52

answered Jun 13 '17 at 18:25

Brobin

3,241
2
19
35

Thanks for the answer! This does work properly, however I'm concerned that other people working with the code will attempt to use the `badlist.items.good()` route, not knowing any better. I want to avoid letting people get misled like that. Additionally, I meant to mention this in the original post: In our actual database, there are less than 100 Lists, but millions of Items. So, for performance reasons it's ideal to do this with one query, rather than a property or multiple queries. – Austin Rose Jun 13 '17 at 18:31
1

Ah, gotcha. I think you can do this using queryset annotation. I edited my answer above with a solution I believe should work. Otherwise, to enhance performance, I'd just add the bad/good column to Item and update that, which makes the query much simpler – Brobin Jun 13 '17 at 18:53

Filtering Django QuerySet based on ManyToMany relationship existence

The Models

The Scenario

The Problem

1 Answers1