2

Djangonauts, I need to tap your brains.

In a nutshell, I have the following three models:

class Location(models.Model):
    name = models.CharField(max_length=100)


class Profile(models.Model):
    locations_of_interest = models.ManyToManyField(Location)


class Question(models.Model):
    locations = models.ManyToManyField(Location)

I want to find all Profiles which locations of interest intersect with the locations specified for a certain question. That’s easy:

question = Question.objects.first()

matching_profiles = Profile.objects.filter(
    locations_of_interest__in=question.locations.all()
)

But in addition, I would also like to know to what extend the locations overlap.

In plain python, I could do something like this:

question_location_names = [l['name'] for l in question.locations.all()]

for profile in matching_profiles:
    profile_location_names = [l['name'] for l in profile.locations_of_interest.all()]
    intersection = set(question_location_names).intersection(profile_location_names)
    intersection_count = len(list(intersection))
    # then proceed with this number

However, it seems to me favourable to do the operation directly in the database if possible.

TL;DR

So my question is:

Is there a way to annotate the profile queryset with this intersection count and that way do the operation in the database?

I have tried several things, but I don’t think they are helpful for those who read this and might know an answer.

creimers
  • 4,975
  • 4
  • 33
  • 55

1 Answers1

2

You can perform this with an .annotate(..) this with a Count(..) on the locations_of_interest number:

from django.db.models import Count

matching_profiles = Profile.objects.filter(
    locations_of_interest__in=question.locations.all()
).annotate(
    locnom=Count('locations_of_interest')
)

Now every matching_profiles instance, will have an attribute called locnom that contains the number of location of interests that matched with the filter.

Note that Profiles without such locations, will not be in the queryset, and that every Profile will occur at most once.

EDIT: counting multiple related non-overlapping (!) fields

You can extend this approach by counting non-overlapping joins, by using distinct=True:

from django.db.models import Count

matching_profiles = Profile.objects.filter(
    locations_of_interest__in=question.locations.all(),
    industries_of_interest__in=question.industries.all()
).annotate(
    locnom=Count('locations_of_interest', distinct=True),
    indnom=Count('industries_of_interest', distinct=True)
)

Note however that this approach scales typically then exponentially with the number of JOINs, so this is typically not scalable if you would add tens our hundreds of annotations.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • This works indeed, thanks. What does not seem to work, however, is to pull the same trick with a second attribute, say `industries_of_interest` on `Profile` and `industries` on `Question`, at the same time. Is that right to your knowledge? – creimers May 28 '18 at 09:07
  • @creimers: I think you can solve this by using `distinct=True` in the count. (on both `Count`s. – Willem Van Onsem May 28 '18 at 09:11
  • ding dong. Now I'm all set. Will not add more than these two annotations. Dankjewel. – creimers May 28 '18 at 09:18
  • Any way to do this without the filtering? What if I still want to show all profiles, even if they're not in the question locations, but still annotate by the intersection of locations so that I can sort on the count of intersection? Would that have to be two queries chained together? – dster77 Jul 24 '18 at 17:34
  • edit: made it it's own question: https://stackoverflow.com/questions/51504634/django-intersect-count-annotate-for-sorting – dster77 Jul 24 '18 at 17:43
  • @dster77: So if I understand it correctly, you want to include `Profile`s as well with no intersecting `locations_of_interest`, and `industries_of_interest` as well, that is the only difference? (in that case `locnum`, etc. will be `0`). – Willem Van Onsem Jul 24 '18 at 18:32