38

Using SQLAlchemy 0.7.1 and a MySQL 5.1 database, I've got a many-to-many relationship set up as follows:

user_groups = Table('user_groups', Base.metadata,
    Column('user_id', String(128), ForeignKey('users.username')),
    Column('group_id', Integer, ForeignKey('groups.id'))
)

class ZKUser(Base, ZKTableAudit):
    __tablename__ = 'users'

    username   = Column(String(128), primary_key=True)
    first_name = Column(String(512))
    last_name  = Column(String(512))

    groups = relationship(ZKGroup, secondary=user_groups, backref='users')

class ZKGroup(Base, ZKTableAudit):
    __tablename__ = 'groups'

    id          = Column(Integer, primary_key=True)
    name        = Column(String(512))

Users can belong to multiple Groups, and Groups can contain multiple Users.

What I'm trying to do is build a SQLAlchemy query that returns only the Users who belong to at least one Group out of a list of Groups.

I played around with the in_ function, but that only seems to work for testing scalar values for membership in a list. I'm not much of a SQL writer, so I don't even know what kind of SELECT statement this would require.

coredumperror
  • 8,471
  • 6
  • 33
  • 42

3 Answers3

75

OK, after a lot of research, I realized that it was my own ignorance of SQL terminology that was holding me back. My search for a solution to find users belonging to "at least one of" the list of groups should have been to find users belonging to "any" of the list of groups. The any ORM function from SQLAlchemy does exactly what I needed, like so:

session.query(ZKUser).filter(ZKUser.groups.any(ZKGroup.id.in_([1,2,3])))

That code emits this SQL (on MySQL 5.1):

SELECT * FROM users 
WHERE EXISTS (
    SELECT 1 FROM user_groups, groups 
    WHERE users.id = user_groups.contact_id 
        AND groups.id = user_groups.group_id 
        AND groups.id IN (%s, %s, %s)
    )
coredumperror
  • 8,471
  • 6
  • 33
  • 42
22

According to the docs for any, the query will run faster if you use an explicit join instead:

Because any() uses a correlated subquery, its performance is not nearly as good when compared against large target tables as that of using a join.

In your case, you could do something like:

users = (
    session.query(ZKUser)
    .join(user_groups)
    .filter(user_groups.columns.group_id.in_([1, 2, 3]))
)

This emits SQL like:

SELECT *
FROM users
JOIN user_groups ON users.id = user_groups.user_id 
WHERE user_groups.group_id IN (1, 2, 3)
z0r
  • 8,185
  • 4
  • 64
  • 83
2

You can use in_:

session.query(ZKUser).filter(ZKGroup.id.in_([1,2])).all()
Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
  • I'll give this a shot when I get back to the office after the weekend. This looks promising. Also, I like the coincidence of my class names and your username. :) My "ZK" stands for Zookeeper, the name of my app. It interfaces with MailChimp's API. – coredumperror Jun 26 '11 at 05:58
  • Unfortunately, this didn't work. I think it's because this isn't doing an actual join, so the SQL that gets emitted by this doesn't actually filter out the non-matching users. I've checked out the docs on SQLAlchemy for doing joins on many-to-many tables, but I'm totally lost. – coredumperror Jun 27 '11 at 20:33
  • 2
    You could try doing `session.query(ZKUser).join(ZKGroup).filter(..)`. It should work in 0.7.2, afaik. – Zach Kelling Jun 27 '11 at 21:20
  • `NotImplementedError: in_() not yet supported for relationships. For a simple many-to-one, use in_() against the set of foreign key values.` SQLAlchemy 1.3.18 – lowercase00 Jun 19 '21 at 21:42