3

I have a strange issue that I simply cannot resolve. Essentially, I have a model and system which works perfectly - except under a very specific (and seemingly arbitrary) set of circumstances.

I'll paste the model in a second but here's the idea. I want certain tables to be versioned. That means for a given table, I break it into two tables, the Master part which has the natural keys for the object, and the Version table which has all the associated data which may change. Then some of my models of course have a relationship, so I create a join table that links versions.

Here are the models:

class Versioned(object):

    def __init__(self, **kwargs):

        super(Versioned, self).__init__(**kwargs)

        self.active = True
        self.created_on = datetime.datetime.now()

    active = Column(BOOLEAN)
    created_on = Column(TIMESTAMP, server_default=func.now())

    def __eq__(self, other):

        return self.__class__ == other.__class__ and \
            all([getattr(self, key) == getattr(other, key)
                for key in self.comparison_keys
                ])

    def __ne__(self, other):

        return not self.__eq__(other)

    comparison_keys = []

class Parent(Base):

    __tablename__ = 'parent'

    id = Column(INTEGER, primary_key=True)

    name = Column(TEXT)

    versions = relationship("ParentVersion", back_populates="master")

    children = relationship("Child", back_populates="parent")

    @property
    def current_version(self):
        active_versions = [v for v in self.versions if v.active==True]

        return active_versions[0] if active_versions else None

class ParentVersion(Versioned, Base):

    __tablename__ = 'parent_version'

    id = Column(INTEGER, primary_key=True)

    master_id = Column(INTEGER, ForeignKey(Parent.id))

    address = Column(TEXT)

    master = relationship("Parent", back_populates="versions")

    children = relationship("ChildVersion",
        secondary=lambda : Parent_Child.__table__
    )

class Child(Base):

    __tablename__ = 'child'

    id = Column(INTEGER, primary_key=True)

    parent_id = Column(INTEGER, ForeignKey(Parent.id))

    name = Column(TEXT)

    versions = relationship("ChildVersion", back_populates="master")

    parent = relationship("Parent", back_populates="children")

    @property
    def current_version(self):
        active_versions = [v for v in self.versions if v.active==True]

        return active_versions[0] if active_versions else None


class ChildVersion(Versioned, Base):

    __tablename__ = 'child_version'

    id = Column(INTEGER, primary_key=True)

    master_id = Column(INTEGER, ForeignKey(Child.id))

    age = Column(INTEGER)

    fav_toy = Column(TEXT)

    master = relationship("Child", back_populates="versions")

    parents = relationship("ParentVersion",
        secondary=lambda: Parent_Child.__table__,
    )

    comparison_keys = [
        'age',
        'fav_toy',
    ]

class Parent_Child(Base):

    __tablename__ = 'parent_child'

    id = Column(INTEGER, primary_key=True)

    parent_id = Column(INTEGER, ForeignKey(ParentVersion.id))
    child_id = Column(INTEGER, ForeignKey(ChildVersion.id))

Okay, so I know the more recent SQLAlchemy models have some idea of versioning, it's possible that I'm doing this the wrong way. But this fits my use case very well. So humor me and let's assume the model is okay (in the general sense - if there's a minor detail causing the bug that would be good to fix)

Now suppose I want to insert data. I have data from some source, I take it in and build models. Ie, split things into Master/Version, assign the child relationships, assign the version relationships. Now I want to compare it against the data already in my database. For each master object, if I find it, I compare the versions. If the versions are different, you create a new version. The tricky part becomes, if a Child version is different, I want to insert a new Parent version, and update all its relationships. Maybe the code makes more sense to explain this part. search_parent is the object I have created in my pre-parsing stage. It has a version, and children objects, which also have versions.

parent_conds = [
    getattr(search_parent.__class__, name) == getattr(search_parent, name)
    for name, column in search_parent.__class__.__mapper__.columns.items()
    if not column.primary_key
]

parent_match = session.query(Parent).filter(*parent_conds).first()

# We are going to make a new version
parent_match.current_version.active=False
parent_match.versions.append(search_parent.current_version)

for search_child in search_parent.children[:]:

    search_child.parent_id = parent_match.id

    search_conds = [
        getattr(search_child.__class__, name) == getattr(search_child, name)
        for name, column in search_child.__class__.__mapper__.columns.items()
        if not column.primary_key
    ]

    child_match = session.query(Child).filter(*search_conds).first()

    if child_match.current_version != search_child.current_version:
        # create a new version: deactivate the old one, insert the new
        child_match.current_version.active=False
        child_match.versions.append(search_child.current_version)

    else:
        # copy the old version to point to the new parent version
        children = parent_match.current_version.children

        children.append(child_match.current_version)
        children.remove(search_child.current_version)
        session.expunge(search_child.current_version)

    session.expunge(search_child)

session.expunge(search_parent)

session.add(parent_match)

session.commit()

Okay, so once again, this might not be the perfect or even best approach. But it does work. EXCEPT, and this is what I can't figure out. It doesn't work if I'm updating the child's age attribute to the integer value zero. If the child objects start with age 0, and I change it to something else, this works beautifully. If I start with some non-zero integer, and update the age to 0, I get this warning:

SAWarning: Object of type <ChildVersion> not in session, add operation   along 'ParentVersion.children' won't proceed (mapperutil.state_class_str(child), operation, self.prop))

The updated version is inserted, however the insert into the parent_child join table doesn't happen. And it's not that it fails, it's that SQLAlchemy has determined the child object doesn't exist and can't create the join. But it does exist, I know it gets inserted.

Again, this only happens if I'm inserting a new version with age=0. If I'm inserting a new version with any other age, this works exactly as I want it to.

There are other odd things about the bug - it doesn't happen if you don't insert enough children (seems to be around 12 triggers the bug), it doesn't happen depending on other attributes sometimes. I don't think I fully understand the surface area of what causes it.

Thanks for taking the time to read this far. I have a fully working demo complete with source data I'd be happy to share, it just requires some setup so I didn't know if it was appropriate in this post. I hope someone has ideas for what to look at because at this point I'm totally out.

edit: Here is the full stack trace leading to the warning.

  File "repro.py", line 313, in <module>
  load_data(session, second_run)
File "repro.py", line 293, in load_data
  session.commit()
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 801, in commit
  self.transaction.commit()
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
  self._prepare_impl()
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
  self.session.flush()
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2019, in flush
  self._flush(objects)
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2101, in _flush
  flush_context.execute()
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
  rec.execute(self)
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 487, in execute
  self.dependency_processor.process_saves(uow, states)
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/dependency.py", line 1053, in process_saves
  False, uowcommit, "add"):
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/dependency.py", line 1154, in _synchronize
  (mapperutil.state_class_str(child), operation, self.prop))
File "/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 1297, in warn
  warnings.warn(msg, exc.SAWarning, stacklevel=2)
File "repro.py", line 10, in warn_with_traceback
  traceback.print_stack()
/Users/me/virtualenvs/dev/lib/python2.7/site-packages/sqlalchemy/orm/dependency.py:1154: SAWarning: Object of type <ChildVersion> not in session, add operation along 'ParentVersion.children' won't proceed
(mapperutil.state_class_str(child), operation, self.prop))

edit2: Here is a gist with a python file you can run to see the strange behavior. https://gist.github.com/jbouricius/2ede420fb1f7a2deec9f557c76ced7f9

jbb
  • 33
  • 6
  • Please post the full stack trace. – univerio Sep 02 '16 at 18:44
  • I'm not sure what you mean - there's no exception raised. SQLAlchemy issues a warning and skips a step I want it to do. But then it proceeds. That warning is all I have. – jbb Sep 02 '16 at 18:47
  • [You can get stack traces from warnings](http://stackoverflow.com/questions/22373927/get-traceback-of-warnings). – univerio Sep 02 '16 at 18:50
  • Got it, thanks. I added it to the post. – jbb Sep 02 '16 at 18:59
  • Can you also post a minimal example that still reproduces the issue? Not the full dataset, mind you, just something like `search_parent = Parent(children=[Child(versions=[ChildVersion(...)])])` etc. – univerio Sep 02 '16 at 19:00
  • Also, just a wild guess: your `session.expunge` lines are probably the culprit. You have a detached instance somewhere in the object graph that you're trying to commit. – univerio Sep 02 '16 at 19:02
  • I unfortunately can't, because I don't have one. I set out to build a minimal example, only it worked exactly as expected. I was 200 lines into my demo when I realized I'd basically rebuilt my previous code. Also I can only get the bug to arise when adding multiple children. – jbb Sep 02 '16 at 19:04
  • Yeah I definitely grant that `session.expunge` is dangerous. I just don't understand why it "almost always" works. – jbb Sep 02 '16 at 19:04
  • It almost always works because you probably almost always have different instances. Expunging an instance not in the object graph of the instance you are trying to commit is okay. Why are you expunging anyway? – univerio Sep 02 '16 at 19:09
  • Can I trouble you to try my full demo? I created a gist here: https://gist.github.com/jbouricius/2ede420fb1f7a2deec9f557c76ced7f9 – jbb Sep 02 '16 at 19:13
  • I'm expunging because I have the existing Master/Version objects, and then I have new Master/Version objects to load. If I attach the pending Version to the existing Master, the pending Master still gets loaded unless I expunge it. – jbb Sep 02 '16 at 19:13
  • Your stage_data.sql is incomplete. Try to distill this down to a single .py file where you use sqlite and you add the data with SQLAlchemy, such that the only code necessary to reproduce the issue is just that .py file. – univerio Sep 02 '16 at 19:23
  • Apologies, made an error in my setup. Fixed and had a coworker try that it works. Also, I'm hesitant to use sqlite because there's a possibility it's specific to postgres somehow. – jbb Sep 02 '16 at 19:32
  • 1
    Okay, @univerio or anyone else still reading this, I edited the gist: https://gist.github.com/jbouricius/2ede420fb1f7a2deec9f557c76ced7f9 It uses sqlite3 so there should be no setup required. Just run `python repro.py` and `python repro.py --bug` to see the behavior. – jbb Sep 02 '16 at 19:58

1 Answers1

1

The reason you get this error is that you've inadvertently added objects into the session.

Here is the MVCE:

engine = create_engine("sqlite://", echo=False)


def get_data():
    children = [
        Child(name="Carol", versions=[ChildVersion(age=0, fav_toy="med")]),
        Child(name="Timmy", versions=[ChildVersion(age=0, fav_toy="med")]),
    ]
    return Parent(
        name="Zane", children=children,
        versions=[
            ParentVersion(
                address="123 Fake St",
                children=[v for child in children for v in child.versions]
            )
        ]
    )


def main():
    Base.metadata.create_all(engine)

    session = Session(engine)
    parent_match = get_data()
    session.add(parent_match)
    session.commit()

    with session.no_autoflush:
        search_parent = get_data()

        parent_match.versions.append(search_parent.current_version)
        for search_child in search_parent.children[:]:
            child_match = next(c for c in parent_match.children if c.name == search_child.name)

            if child_match.current_version != search_child.current_version:
                child_match.versions.append(search_child.current_version)
            else:
                session.expunge(search_child.current_version)

            session.expunge(search_child)

        session.expunge(search_parent)
        session.commit()

Aside: this is what you needed to provide in the question itself. Providing a tarball with instructions is not the best way to get answers.

The line

parent_match.versions.append(search_parent.current_version)

not only adds search_parent.current_version, it also adds search_parent, which in turn adds all related objects, including the child versions of other children. Judging by the fact that you later expunge other related objects to prevent them from being added to the session, I conclude that you only want to add search_parent.current_version without adding other related objects. Due to the circular nature of your relationships you need to take care to lift only the objects you want out of search_parent before you add them. Here is the fixed MVCE:

with session.no_autoflush:
    search_parent = get_data()

    current_parent_version = search_parent.current_version
    search_parent.versions.remove(current_parent_version)
    current_parent_version.children = []  # <--- this is key
    for search_child in search_parent.children[:]:
        child_match = next(c for c in parent_match.children if c.name == search_child.name)

        if child_match.current_version != search_child.current_version:
            current_child_version = search_child.current_version
            search_child.versions.remove(current_child_version)
            child_match.versions.append(current_child_version)
            current_parent_version.children.append(current_child_version)

    parent_match.versions.append(current_parent_version)

    session.commit()
univerio
  • 19,548
  • 3
  • 66
  • 68
  • 1
    Thanks for the answer, really appreciate you taking the time. Sorry about the tarball. This isn't quite working for me, it suppresses the warning but it's missing the other child objects. But the idea about removing the parent objects first is really helpful. Thanks again for looking at this. – jbb Sep 02 '16 at 22:22
  • Ah, just required more tweaking in the case when the versions are the same and you want to port them forward. You're a hero! – jbb Sep 02 '16 at 22:46