SQLAlchemy, eager loading on object refresh

Question

If have following ORM setup in SQLAlchemy:

class Foo(Base):
    id = Column(Integer, primary_key=True)
    status = Column(String)
    barId = Column(Integer, ForeignKey("bar.id"))
    bar = relationship("Bar", lazy="joined")

class Bar(Base):
   id = Column(Integer, primary_key=True)

So I would like to always have the associated Bar object available for each Foo object. I often detach the Foo objects from the session and continue using its values and Bar's values. From time to time I need to update Foo's status field. In that case I create a new session, add the foo object to the session and commit it. After the commit the Bar object associated with the Foo object is invalidated but not reloaded by the commit's implicit refresh of the Foo object. After detaching the Foo object again from the session, the Bar object is no longer usable. The only way I found to work around that is to explicitly eager load the bar object after committing foo.

Example work flow:

session = Session()
foo = session.query(Foo).get(id) <-- foo.bar is automatically eager loaded
session.close()
....
session = Session()
session.add(foo)
foo.status = 'done'
session.commit()       <-- foo is commited and refreshed, foo.bar is not
session.refresh(foo)   <-- same here, foo.bar is not loaded
#foo.bar               <-- only explicit eager loading foo.bar here works
session.close()
....
foo.bar                <-- error if not explicitly eager loaded

I would like to use this setup for a few of those small Bar like objects. Requiring me to remember to always explicitly reload the foo.bar object is error prone. So my question is: can I eager load foo.bar in all situations, be it a query(), a commit() (implicit refresh) or an (explicit) refresh()?

score 5 · Accepted Answer · answered Jun 04 '13 at 00:17

First thing, a "commit()" is not a "refresh" - it actually expires all data, so you'd see all the mapped attributes are no longer present in foo.__dict__. The implicit refresh occurs when you touch those attributes again. It is a very common practice to simply set expire_on_commit=False within a Session, for those many many applications that don't require cross-transaction synchronization after a commit, so that would probably be the most implicit workflow.

Next thing, session.refresh(foo) will load bar using the configured eager loader. Not sure why you're seeing foo.bar not loaded, I checked and this feature goes way back at least to version 0.5. A simple test confirms it:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Foo(Base):
    __tablename__ = 'foo'
    id = Column(Integer, primary_key=True)
    status = Column(String)
    barId = Column(Integer, ForeignKey("bar.id"))
    bar = relationship("Bar", lazy="joined")

class Bar(Base):
    __tablename__ = 'bar'
    id = Column(Integer, primary_key=True)

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e)

s.add(Foo(id=1, bar=Bar()))
s.commit()

f1 = s.query(Foo).get(1)
f1.status = 'done'
s.commit()

assert 'bar' not in f1.__dict__
s.refresh(f1)
assert 'bar' in f1.__dict__
s.close()

assert f1.bar.id == 1

next thing, SQLAlchemy discourages using objects in their "detached" state for the general reason that your mapped object represents a proxy to an ongoing database transaction. That's why when the transaction ends, all data is expired. Personally, I don't think there's usually a valid reason that objects need to be used in a detached state; detachment is mainly for the purposes of transporting objects to other sessions, storing them in caches, stuff like that. But we do have lots of users that rely upon detached usage patterns in any case, and I've made sure I can support them to a reasonable degree, so I wouldn't worry too deeply about it.

In my particular setup the `s.commit()` does make a select on the foo object w/o joining it with bar (before I do touch any attr myself). I'm not sure why. I mistakenly took it for an implicit refresh as you pointed out. I can't preproduce this with your little demo. Eager loading does work with an explicit `s.refresh(foo)`. I've missed that somehow, I'm still learning/experimenting a lot with SQLAlchemy. — dieterg, Jun 04 '13 at 07:13
SQLAlchemy discourages using objects in their "detached" state, you say. My Foo objects live over several HTTP requests and I always close my orm session at the end of a request. I only need to update the foo object 1/100 requests the other 99 I just use the data. What design would you then recommend? 1) Copy Foo's and Bar's data to non orm objects and requery if I need to update. 2) Open a new orm session even if I do not need to update/refresh Foo. 3) ... — dieterg, Jun 04 '13 at 07:27
usually a web application loads all the data it needs on every request. if you load your Foo object on request number one, then request number two comes in six hours later, have you been storing those "Foo" objects in memory those whole six hours? How do you know they are still up to date? This is really a caching pattern. SQLAlchemy's recommendation for caching is that objects are re-attached or merged to a new Session on each request - see [dogpile caching](http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#dogpile-caching) for recommended design guidelines. — zzzeek, Jun 05 '13 at 15:48
Just for information: I'm designing and implementing a distributed system using XMLRPC over HTTP. I'm implementing both sides so I know I will be receive requests on regular basis, how log the Foo objects will live and that they won't get out of date while I keep them in memory. My questions are of course biased towards this design. — dieterg, Jun 05 '13 at 19:54

SQLAlchemy, eager loading on object refresh

1 Answers1