Django filter exact m2m objects

Question

Let's say I have a team model, and a team has members.

So

class Team(models.Model):
    team_member = models.ManyToManyField('Employee')

class Employee(models.Model):
    ....

Lets say I have a list of employee ids like team_members = [1001, 1003, 1004] and I want to find the Team, that is made up of exactly those three members.

I don't want the team that has [1001, 1003, 1004, 1005] or the team that has [1001, 1003].

Only team [1001, 1003, 1004].

This is what I'm doing now:

teams = Team.objects.all()
for t in teams:
    if set([x.id for x in t.team_member.all()]) == set(team_members):
        team = t
if not team:
    team = Team.objects.create()
    team.team_member = team_members

But it seems a bit ham-handed. Is there a cleaner way, with fewer nested loops?

I think you can use `in` -> `Team.objects.filter(team_member__pk__in=team_members)` — Gocht, Mar 04 '16 at 19:11
That would return a team that contained those three members and others. I want EXACTLY those team members. — Rob L, Mar 04 '16 at 19:13
It returns a queryset of teams that contain ANY of `team_members`. I need the one team that is composed of ONLY those three team members. — Rob L, Mar 04 '16 at 19:20
Possible duplicate of [Django ManytoMany filter exact list](http://stackoverflow.com/questions/40505500/django-manytomany-filter-exact-list) — waterproof, Apr 14 '17 at 19:08

score 1 · Answer 1 · edited May 23 '17 at 12:09

The short answer

No, I don't know of a much simpler way in terms of code appearance.

However there are some things you could do to make your code a little more graceful and potentially a lot faster. Plus it is possible to do the work in the database, albeit quite inefficiently for large team sizes.

The DB option listed below is pretty much as ham-handed as the for loop you provided, but could be more efficient depending on your data set, DB, etc.

Longer answer: ways to be less 'ham-handed'

There are a couple of places I'd clean up the style here.

Plus, in my experience with Django, loops like the one you built do tend to become pretty expensive on large data sets. If you end up loading, say, 10,000 teams into memory, having the ORM convert them to Team objects, and then iterating over them, you'll probably see some significant slowdown.

Two things to try for speed & grace:

Use Team.values_list('team_members') for your in-python filter loop, which skips the step where Django organizes all of the SQL data into Model objects. I've found this to save lots of time instantiating objects (sometimes around an order of magnitude).
straighten out your set() calls. Currently you're re-converting team_members to a set() on every iteration, plus you're turning t.team_member implicitly into TeamMember objects (as they're fetched from the DB), then into a list of ids and then into a set. For the first item, just make a team_members_set = set(team_members) up front and reuse it. For the second item, you can do set(t.team_member.values_list('id', flat=True)) which will skip the heaviest ORM step of instantiating TeamMembers (which could be as bad as O(n^2) in your example depending on the data set and Django's caching).
use Team.objects.all().iterator() to not load the Teams all into memory at once. This will help if you're running into memory issues.

But with any performance optimization, of course test your perf with real or real-ish data to be sure you're not making things worse!

Longer answer: the DB option

After trying all manner of Q() manipulation and other approaches listed in the answers here, to no avail, I found this answer by @Todor.

Basically you need to do repeated filter()s, one for each team_member. On top of that you use a Count filter to make sure that you don't end up choosing a Team with a superset of the desired members.

desired_members = [1001, 1003, 1004]
initial_queryset = Team.objects.annotate(cnt=models.Count('team_members')).filter(cnt=len(desired_members))
matching_teams = reduce( # Can of course use a for loop if you prefer that to reduce()
    lambda queryset, member: queryset.filter(team_members=member),
    desired_members,
    initial_queryset
)

Note that the resulting query will likely have perf issues for large teams, since it will do one JOIN for every one of your desired_members. It'd be nice to avoid that but I don't know of another way to do this all in the database without changing your data structure. I'd love to learn a better way, and if you end up doing some perf testing I'd be curious to find what you learn!

Yes, that answer (which is newer than this question, I believe) looks like it would work. But I'm not sure it's better than what I'm doing. — Rob L, Apr 17 '17 at 13:30
Good point, I updated the answer to more directly answer your question (not just the DB-oriented answer that I answered previously). — waterproof, Apr 17 '17 at 16:12

Cagatay Barin · Accepted Answer · 2016-03-07T16:04:11.560

-1

Maybe you can use annotate for the count of team_member. Can you try this?

Team.objects.filter(team_member__pk__in=team_members).annotate(num_team=Count('team_member')).filter(num_team=len(team_members))

edited Mar 07 '16 at 16:04

answered Mar 04 '16 at 21:17

Cagatay Barin

3,428
2
24
43

OK, that works! After fixing the little syntax error (`Count(team_member)` -> `Count('team_member')`. It's definitely fewer loops, but I'm not sure it's as readable, because it's asking a different (though logically the same) question. Finally, that query is much, much slower. On a very small database it takes 14.5 ms, where the `.all()` query takes .3 ms. – Rob L Mar 07 '16 at 15:07
Sorry for the syntax error since i didn't have a chance to test the code. I just saw the part of the solution below which gives the all teams that have given members. I just added the annotate part of the code. If it's really too slow, we can think of another solution. – Cagatay Barin Mar 07 '16 at 15:16
It's not too slow, it's really cool. Just not convinced it's more readable. – Rob L Mar 07 '16 at 15:24
Uhh ok then :) Don't worry about anything else unless it doesn't give you what you want always. – Cagatay Barin Mar 07 '16 at 15:27
2

Wouldn't this get the team with any of the team members, then count the team to compare against the team count? So, then it would return all the teams that has one of the members and has the same number of members as 'team_members', but not the teams that have those and only those members. – jmerkow Feb 10 '17 at 20:37

Gocht · Answer 3 · 2016-03-04T19:27:35.277

-2

To get the Team with those exact three members you can use:

Team.objects.get(team_member__pk=team_members)  # This code was untested

You could also try with a list of Employee objects:

# team_members = Employee.objects.filter(pk__in=tem_members)

team_members = [<Employee: Employee object>, <Employee: Employee object>, <Employee: Employee object>]

Team.objects.get(team_member=team_members)

edited Mar 04 '16 at 19:27

answered Mar 04 '16 at 19:23

Gocht

9,924
3
42
81

Nope: `TypeError: int() argument must be a string or a number, not 'list'` – Rob L Mar 04 '16 at 19:25
I guess that belong to the first part of my answer, as I said, that code was untested, please try with the second part. – Gocht Mar 04 '16 at 19:26
No, the second returns the same queryset as `Team.objects.filter(team_member__pk__in=team_members)`, that is, all teams that have any of them as member, rather than their one team. – Rob L Mar 04 '16 at 19:28
@RobL It's a `get()` query, it can not return more than one result. – Gocht Mar 04 '16 at 19:30

Django filter exact m2m objects

3 Answers3

The short answer

Longer answer: ways to be less 'ham-handed'

Two things to try for speed & grace:

Longer answer: the DB option