30

I currently have a column that contains HTML markup. Inside that markup, there is a timestamp that I want to store in a new column (so I can query against it). My idea was to do the following in a single migration:

  1. Create a new, nullable column for the data
  2. Use the ORM to pull back the HTML I need to parse
  3. For each row
    1. parse the HTML to pull out the timestamp
    2. update the ORM object

But when I try to run my migration, it appears to be stuck in an infinite loop. Here's what I've got so far:

def _extract_publication_date(html):
    root = html5lib.parse(html, treebuilder='lxml', namespaceHTMLElements=False)
    publication_date_string = root.xpath("//a/@data-datetime")[0]
    return parse_date(publication_date)


def _update_tip(tip):
    tip.publication_date = _extract_publication_date(tip.rendered_html)
    tip.save()


def upgrade():
    op.add_column('tip', sa.Column('publication_date', sa.DateTime(timezone=True)))
    tips = Tip.query.all()
    map(tips, _update_tip)


def downgrade():
    op.drop_column('tip', 'publication_date')
Hank Gay
  • 70,339
  • 36
  • 160
  • 222
  • How do you know it is stuck in an infinite loop? – X-Istence Dec 03 '12 at 05:59
  • 1
    If `Tip.query` is not using the same session as `op`, then there will be 2 transactions, with the `SELECT` one stuck waiting for the `ALTER TABLE` one to commit. Anyway, I think it is cleaner to move the ORM portion to its own script, to be run manually after `alembic upgrade`. – sayap Dec 03 '12 at 07:10
  • @X-Istence I don't know it's stuck in an infinite loop. I **DO** know that the command never returns. – Hank Gay Dec 03 '12 at 13:15
  • @sayap I had considered that, but then that means that I can't track all my database upgrade logic in the same place. Also, if the ORM logic works, then I can add another structural step to the migration to make the new column `NOT NULL`. I'd prefer it if I could just figure out how to make the ORM use the seame `Session` as the `op.foo` methods. – Hank Gay Dec 03 '12 at 13:18
  • I too strive to have the logic in the same place, but only when I can do so with SQL. Alembic doesn't recommend close coupling between the migration scripts and the application code. Though I do see your points. You may want to try to do the query first and build a dict of `id: publication_date`, then do an update on that using `op.execute()`. If your ORM session has `autocommit=True`, I think the first transaction will be closed right after the query, so there won't be overlapping transactions. – sayap Dec 03 '12 at 13:50

4 Answers4

42

After a bit of experimentation using @velochy's answer, I settled on something like the following pattern for using SqlAlchemy inside Alembic. This worked great for me and could probably serve as a general solution for the OP's question:

from sqlalchemy.orm.session import Session
from alembic import op

# Copy the model definitions into the migration script if
# you want the migration script to be robust against later
# changes to the models. Also, if your migration includes
# deleting an existing column that you want to access as 
# part of the migration, then you'll want to leave that 
# column defined in the model copies here.
class Model1(Base): ...
class Model2(Base): ...

def upgrade():
    # Attach a sqlalchemy Session to the env connection
    session = Session(bind=op.get_bind())

    # Perform arbitrarily-complex ORM logic
    instance1 = Model1(foo='bar')
    instance2 = Model2(monkey='banana')

    # Add models to Session so they're tracked
    session.add(instance1)
    session.add(instance2)

    # Apply a transform to existing data
    m1s = session.query(Model1).all()
    for m1 in m1s:
        m1.foo = transform(m1.foo)
    session.commit()

def downgrade():
    # Attach a sqlalchemy Session to the env connection
    session = Session(bind=op.get_bind())

    # Perform ORM logic in downgrade (e.g. clear tables)
    session.query(Model2).delete()
    session.query(Model1).delete()

    # Revert transform of existing data
    m1s = session.query(Model1).all()
    for m1 in m1s:
        m1.foo = un_transform(m1.foo)
    session.commit()

This approach appears to handle transactions properly. Frequently while working on this, I would generate DB exceptions and they would roll things back as expected.

ntc2
  • 11,203
  • 7
  • 53
  • 70
killthrush
  • 4,859
  • 3
  • 35
  • 38
  • 10
    It's worth noting that this technique may be ill-advised if your models are changing frequently. When models change, it may break old migrations that assume models have a certain shape. – killthrush Jul 17 '16 at 17:40
  • 5
    I wouldn't say that this technique is ill-advised if models are changing, it just might be safer to define the models within the migration rather than importing. – kevlarr Jun 17 '19 at 13:07
  • 3
    Yes, making a copy of a model for use inside older migrations is a legit technique to resolve this issue. Then, the changes can happen *without* breaking old stuff. The main question then becomes whether or not that creates enough extra work to be burdensome. That depends, IMO. Might be fine! – killthrush Jun 18 '19 at 13:45
  • 2
    I edited the answer to suggest defining the models in the migration. Note that defining the models in the migration is necessary in some special cases, such as when deleting a column from the model but also using that column as part of the migration before the deletion. – ntc2 Jan 23 '21 at 09:42
10

What worked for me is to get a session by doing the following:

connection = op.get_bind()
Session = sa.orm.sessionmaker()
session = Session(bind=connection)
velochy
  • 371
  • 2
  • 12
  • This one *sort of* worked for me, though there were errors saying my models were already bound to a session. – killthrush Mar 17 '16 at 21:21
8

You can use the automap extension to automatically create ORM models of your database as they exist during the time of the migration, without copying them to the code:

import sqlalchemy as sa
from alembic import op
from sqlalchemy.ext.automap import automap_base

Base = automap_base()

def upgrade():
    # Add the new column
    op.add_column('tip', sa.Column('publication_date', sa.DateTime(timezone=True)))

    # Reflect ORM models from the database
    # Note that this needs to be done *after* all needed schema migrations.
    bind = op.get_bind()
    Base.prepare(autoload_with=bind)  # SQLAlchemy 1.4 and later
    # Base.prepare(bind, reflect=True)  # SQLAlchemy before version 1.4
    Tip = Base.classes.tip  # "tip" is the table name

    # Query/modify the data as it exists during the time of the migration
    session = Session(bind=bind)
    tips = session.query(Tip).all()
    for tip in tips:
        # arbitrary update logic
        ...


def downgrade():
    op.drop_column('tip', 'publication_date')
3

Continue from the comments, you can try something like this:

import sqlalchemy as sa


tip = sa.sql.table(
    'tip',
    sa.sql.column('id', sa.Integer),
    sa.sql.column('publication_date', sa.DateTime(timezone=True)),
)


def upgrade():
    mappings = [
        (x.id, _extract_publication_date(x.rendered_html))
        for x in Tip.query
    ]

    op.add_column('tip', sa.Column('publication_date', sa.DateTime(timezone=True)))

    exp = sa.sql.case(value=tip.c.id, whens=(
        (op.inline_literal(id), op.inline_literal(publication_date))
        for id, publication_date in mappings.iteritems()
    ))

    op.execute(tip.update().values({'publication_date': exp}))


def downgrade():
    op.drop_column('tip', 'publication_date')
sayap
  • 6,169
  • 2
  • 36
  • 40