0

So after reading SQLAlchemy ordering by count on a many to many relationship, I tried to replicate the result, but it's not working correctly. So my models are,

class Group(Base):
__tablename__='groups'
__table_args__={
    'mysql_engine':'InnoDB',
    'mysql_charset':'utf8',
}

id = Column(Integer, primary_key=True, unique=True)
name = Column(VARCHAR(30), primary_key=True, unique=True)
time = Column(DateTime, onupdate = datetime.datetime.now)
description = Column(VARCHAR(255))
creator_id = Column(Integer, ForeignKey('users.id'))
privacy = Column(SMALLINT) # 0 == public, 1 == friends, 2 == private

def __init__(self, name, descr, creator, privacy):
    self.name = name
    self.description = descr
    self.creator_id = creator
    self.privacy = privacy

class GroupUserRelationship(Base):
__tablename__='groupUserRelationships'
__table_args__={
    'mysql_engine':'InnoDB',
    'mysql_charset':'utf8',
}

id = Column(Integer, primary_key = True)
group_id = Column(Integer, ForeignKey('groups.id'))
user_id = Column(Integer, ForeignKey('users.id'))
time = Column(DateTime, onupdate=datetime.datetime.now)

def __init__(self, group, user):
    self.group_id = group
    self.user_id = user

and my sqlalchemy query is groups = session.query(Group, func.count(GroupUserRelationship.user_id).label('total')).join(GroupUserRelationship).group_by(Group).order_by('total DESC').limit(20).all(), but when I try to iterate over the list that it returns and access the group id, I get an AttributeError: 'NamedTuple' Does not have attribute id. Whats going wrong?

Community
  • 1
  • 1
Wiz
  • 4,595
  • 9
  • 34
  • 51

1 Answers1

2

a query of this form:

session.query(Group, func.count(GroupUserRelationship.user_id).label('somelabel'))

will return a list of tuples like this:

[
    (group1, 5),
    (group2, 7),
    ...
]

.. etc.

Iteration to get at group.id is then:

for group, user_id in session.query(Group, func.count(GUR.user_id).label('somelabel')).join(...):
    print group.id

For the count, the first technique that I feel is important is to get into the habit of not grouping by an entire row (i.e. group_by(Group)). While the query here can be made to work using that technique, it's poor practice because you are making the database do lots of extra work matching all the columns of the entire group table, when really all you need grouped is the single column GroupUserRelationship.user_id. I refer to this article http://weblogs.sqlteam.com/jeffs/archive/2005/12/14/8546.aspx for some exposition on that. The SQLAlchemy tutorial then features an example of this form here: http://docs.sqlalchemy.org/en/rel_0_7/orm/tutorial.html#using-subqueries .

The next thing that works really great in SQLAlchemy is to use a relationship() to establish a particular join path between two classes. so here is that, with the grouping expression done using the subquery. The particular trick used here, which is optional, is that you can say join(subquery, Group.gur) which means "join to this subquery using the equivalent join condition of the Group.gur relationship".

edited to illustrate a full round trip example

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Group(Base):
    __tablename__ = 'groups'

    id = Column(Integer, primary_key=True)
    name = Column(VARCHAR(30))
    gur = relationship("GroupUserRelationship")

class GroupUserRelationship(Base):
    __tablename__ = 'groupUserRelationships'

    id = Column(Integer, primary_key=True)
    group_id = Column(Integer, ForeignKey('groups.id'))

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e)

s.add_all([
    Group(name='g1', gur=[GroupUserRelationship() for i in xrange(3)]),
    Group(name='g2', gur=[GroupUserRelationship() for i in xrange(8)]),
    Group(name='g3', gur=[GroupUserRelationship() for i in xrange(5)]),
    Group(name='g4', gur=[GroupUserRelationship() for i in xrange(1)]),
    Group(name='g5', gur=[GroupUserRelationship() for i in xrange(2)]),
])
s.commit()


gur_count = s.query(
                func.count(GroupUserRelationship.id).label('total'),
                GroupUserRelationship.group_id
            ).group_by(GroupUserRelationship.group_id).\
            subquery()

for group, gur_count in s.query(Group, gur_count.c.total).\
            join(gur_count, Group.gur).\
            order_by(gur_count.c.total):
    print "GROUP:", group.name, "GROUP ID:", group.id, "NUMBER OF GUR:", gur_count

output (minus SQL echoing, which is useful to see what's going on):

GROUP: g4 GROUP ID: 4 NUMBER OF GUR: 1
GROUP: g5 GROUP ID: 5 NUMBER OF GUR: 2
GROUP: g1 GROUP ID: 1 NUMBER OF GUR: 3
GROUP: g3 GROUP ID: 3 NUMBER OF GUR: 5
GROUP: g2 GROUP ID: 2 NUMBER OF GUR: 8
zzzeek
  • 72,307
  • 23
  • 193
  • 185
  • Is there a way to order them? – Wiz Sep 10 '12 at 02:07
  • It doesn't seem like the tuple contains the group id. When I print the response to the query I get a tuple that contains: [(, 1L), (, 2L)] – Wiz Sep 10 '12 at 20:59
  • (, 1L) is not a "tuple". the access pattern is the one I noted previously, "for group, group_id in s.query(...)" – zzzeek Sep 11 '12 at 00:28
  • It seems like the query is only returning the top group, with the most relationships;this is my query:`gur_count = session.query(func.count(GroupUserRelationship.id).label('total'), GroupUserRelationship.group_id).group_by(GroupUserRelationship.group_id).subquery() for group, user_id in session.query(Group, gur_count.c.total).join(gur_count, Group.gur).order_by(gur_count.c.total).all():` And then I print all the results from the for loop, and all I get is ['asdf', 1L], when the name is the first element. – Wiz Sep 11 '12 at 01:47
  • I've edited the script to include a working round trip example; please copy it exactly and run it to understand how it works before determining what's different about the example you're running on your end. – zzzeek Sep 11 '12 at 11:03
  • Wait, so you have to add a gur every time someone initiates a new relationship? – Wiz Sep 11 '12 at 22:41