1

Given these Django models:

from django.db import models

class Thing(models.model):
    name = models.CharField('Name of the Thing')

class Category(models.model):
    name = models.CharField('Name of the Category')
    things = models.ManyToManyField(Thing, verbose_name='Things', related_name='categories')

Note that all the categories a Thing is in can be found by:

thing = Thing.objects.get(id=1) # for example
cats = thing.categories.all() # A QuerySet

I'm really struggling to build a query set that returns all Things in all of a given set of Categories.

Let's say we have 5 categories, with IDs 1, 2, 3, 4, 5.

And say I have a subset of categories:

my_cats = Category.objects.filter(id__in=[2,3])

I want to find all Things that are in say categories, 2 AND 3.

I can find all Things in category 2 OR 3 easily enough. For example this:

Thing.objects.filter(categories__in=[2,3])

seems to return just that, Things in category 2 OR 3.

And something like:

Thing.objects.filter(Q(categories=2)|Q(categories=3))

also, but this returns nothing:

Thing.objects.filter(Q(categories=2)&Q(categories=3))

I might envisage something like:

Thing.objects.filter(categories__contains=[2,3])

but of course that's a dream as contains operates on strings not ManyToMany sets.

Is there a standard trick here I'm missing?

I spun up a sandbox here to test and demonstrate:

https://codesandbox.io/p/sandbox/django-m2m-test-cizmud

It implements this simple pair of models and populates the database with a small set of things and categories and tests the queries, here's the latest state of it:

print("Database contains:")
for thing in Thing.objects.all():
    print(
        f"\t{thing.name} in categorties {[c.id for c in thing.categories.all()]}")
print()

# This works fine. Prints:
# Cat1 OR Cat2: ['Thing 1', 'Thing 5', 'Thing 4']
things = Thing.objects.filter(
    Q(categories=1) | Q(categories=2)).distinct()
print(f"Cat1 OR Cat2: {[t.name for t in things]}")

# We would love this to return Thing4 and thing5
# The two things in the test data set that are in
# Category 2 and in Category 3.
# But this does not work. It prints:
# Cat2 AND Cat3: []
# because
# What does yield ['Thing 4', 'Thing 5']?
print("\nAiming to to get: ['Thing 4', 'Thing 5']")
things = Thing.objects.filter(
    Q(categories=2) & Q(categories=3)).distinct()
print(f"Try 1: Cat2 AND Cat3: {[t.name for t in things]}")

# This also fails, producing an OR not AND
things = Thing.objects.filter(categories__in=[2, 3]).distinct()
print(f"Try 2: Cat2 AND Cat3: {[t.name for t in things]}")

# Also fails
things = Thing.objects.filter(categories__in=[2, 3])\
                      .filter(categories=2).distinct()
print(f"Try 3: Cat2 AND Cat3: {[t.name for t in things]}")

# Also fails
things = Thing.objects.filter(categories__in=[2, 3], categories=2)\
                      .distinct()
print(f"Try 4: Cat2 AND Cat3: {[t.name for t in things]}")

and it's output:

Database contains:
        Thing 1 in categorties [1, 2]
        Thing 2 in categorties [3, 4]
        Thing 3 in categorties [5]
        Thing 4 in categorties [2, 3]
        Thing 5 in categorties [1, 2, 3]

Cat1 OR Cat2: ['Thing 1', 'Thing 5', 'Thing 4']

Aiming to to get: ['Thing 4', 'Thing 5']
Try 1: Cat2 AND Cat3: []
Try 2: Cat2 AND Cat3: ['Thing 1', 'Thing 4', 'Thing 5', 'Thing 2']
Try 3: Cat2 AND Cat3: ['Thing 1', 'Thing 4', 'Thing 5']
Try 4: Cat2 AND Cat3: ['Thing 1', 'Thing 4', 'Thing 5']

I guess if I can work it out in SQL, we can write us a custom lookup:

https://docs.djangoproject.com/en/4.2/howto/custom-lookups/

But why do I think this must already have been written? How this be such a unique and new use case?

Bernd Wechner
  • 1,854
  • 1
  • 15
  • 32
  • Wrong using with Q&Q and contains, because one object has a unique id, so it never has two different ids. – Blackdoor May 07 '23 at 11:47
  • @Bernd Wechner But maybe I don't understand something. If: categories__contains=[2,3] categories will return the id you want to compare against the contains string. How to understand this or do you have categories returns strings? – inquirer May 07 '23 at 14:02
  • @Blackdoor. Indeed. But `categories` is the related name of Category in Things. The default would be `category_set`. – Bernd Wechner May 07 '23 at 20:55
  • @inquirer. Apologies. As noted with Blackdoor, `categories` is the related name of the ManyToManyField Category, so Fields.categories is a set of categies (it's actually a ManyToManyDescriptor in the model and a ManyRelatedManager in the object). I'll clarify the question. – Bernd Wechner May 07 '23 at 21:00
  • Apologies for any confusion and hope the clarified question is is useful. – Bernd Wechner May 07 '23 at 21:09

1 Answers1

1
Thing.objects.annotate(cat_count=Count('id', filter=Q(categories__in=[2, 3]))).\
        filter(cat_count__gte=2).values('id', 'cat_count')

Try it. cat_count__gte=2 because the two numbers are 2, 3.

inquirer
  • 4,286
  • 2
  • 9
  • 16
  • Alas `Thing.objects.filter(Q(categories__in=[2,3]) & Q(categories=2))` also fails, and I have found nothing that works, and need a solution more than I need to understand why a given idea doesn't work. I can examine the SQL of each query to understand how it fails and why. I am struggling to find a Django syntax that works. The sandbox in the OP has a few efforts now in it, including ideas you've offered. Alas, no luck. – Bernd Wechner May 09 '23 at 04:52
  • Thing.objects.filter( Q(categories=2) & Q(categories=3)) returns an empty QuerySet because categories=2 must be and categories=3 at the same time. That is, id 2 cannot be id 3. To put it another way. Why don't you want to use: categories__in=[2, 3]? You probably want to get thing.categories.all() without using a loop, for each Thing? If so, then I don't know how to do it, I tried [here](https://stackoverflow.com/questions/76014537/efficient-way-to-gathering-data-by-datetime-range-for-multiple-rows-stored-in-an/76157614#76157614), but it is with a loop. – inquirer May 09 '23 at 10:13
  • Yes, it's fairly obvious why `Q(categories=2) & Q(categories=3)` return empty, I offer it only as an example of something tried and not working (I mean `Q(categories=2) | Q(categories=3)` works as expected after all, so a naive shot at moving from OR to AND is just that. No expectation it would work. `categories__in=[2, 3]` is fine except that is exactly the same as `Q(categories=2) | Q(categories=3)` and is Try 2 in the sample in the OP, it does not return what is desired. And yes, seeking a Query (examining every Thing is not sensible). – Bernd Wechner May 09 '23 at 10:19
  • Did I understand you correctly, you need to get, as in your example, what did "Database contains" return? – inquirer May 09 '23 at 10:27
  • 1
    Database contains is produced by the code in the OP, it is just a list of all Things in the database and their Categories, so I can assess the success of the queryset. I can see by eye that only Thing 4 and Thing 5 are in Category 2 and in Category 3. And that is the result I'm after. I should hope the code and sandbox are slowly clear, or? – Bernd Wechner May 09 '23 at 10:30
  • @Bernd Wechner updated answer. – inquirer May 09 '23 at 14:35
  • Brilliant! If you check the sandbox, I've implemented it there as a test. Two small things: 1) __get not needed, simply cat_count=2 works. I think this is because the `__in` filter in the annotation only returns cats in the set (extras are ignored implicitly, so the `gte` works implicitly) 2) No need for `.values()`, am happy just returning Things ;-). But otherwise the perfect insight is the filter argument to the Count aggregator! I will check the generated SQL and confirm as will and report back should I find a concern. – Bernd Wechner May 09 '23 at 21:43
  • Checked the SQL, and it's perfect. Thanks enormously! That filter on the count was the trick. Though as noted `_gte` not needed. simply = is fine. – Bernd Wechner May 09 '23 at 22:49