Efficiently updating database using SQLAlchemy ORM

Question

I'm starting a new application and looking at using an ORM -- in particular, SQLAlchemy.

Say I've got a column 'foo' in my database and I want to increment it. In straight sqlite, this is easy:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

I figured out the SQLAlchemy SQL-builder equivalent:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

This is slightly slower, but there's not much in it.

Here's my best guess for a SQLAlchemy ORM approach:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that's because it has to bring all the data into memory before it can work with it.

Is there any way to generate the efficient SQL using SQLAlchemy's ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?

Ok, I'm assuming the answer is "this is not something ORMs do well". Oh well; I live and learn. — John Fouhy, Nov 09 '08 at 23:58
There have been some experiments run on different ORMs and how they perform under load and duress. Don't have a link handy, but worth reading. — Matthew Schinckel, Nov 18 '08 at 11:43
Another problem that exists with the last (ORM) example is that it is not [atomic](http://en.wikipedia.org/wiki/Atomic_operation). — Marian, May 29 '10 at 01:38

score 221 · Accepted Answer · edited Jul 24 '19 at 09:13

SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.

So if you say

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.

Instead you should do this:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don't have any stale data issues.

In the almost-released 0.5 series you could also use this method for updating:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.

in the 3rd way, will it trigger orm event(like after_update)? — Ken, Mar 11 '17 at 11:52
@Ken, no, it won't. See the API doc for Query.update https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.update. Instead you have an event for after_bulk_update https://docs.sqlalchemy.org/en/13/orm/events.html#sqlalchemy.orm.events.SessionEvents.after_bulk_update — TrilceAC, May 26 '20 at 11:11

score 129 · Answer 2 · edited May 15 '13 at 11:55

129

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

Try this =)

edited May 15 '13 at 11:55

Prof. Falken

24,226
19
100
173

answered Dec 27 '10 at 16:28

Vin

1,307
1
8
2

This method worked for me. But the problem is its slow. It needs a good piece of time for a few 100k data records. Is there maybe a faster method? – sunwarr10r Dec 23 '16 at 18:16
Thanks a lot this approach worked for me. Its really bad that sqlachemy doesn't has a shorter way to update the `json` column – Jai Prakash Jun 26 '18 at 03:47
13

For those still having performance issues when using this method: by default this might do a SELECT for every record first, and only UPDATE afterwards. Passing synchronize_session=False to the update() method prevents this from happening, but make sure to only do this if you don't use the objects you update again before the commit(). – teuneboon Jan 09 '19 at 12:45
what is `Clients`? I don't see any reference to it in the original post or in yours, so I'm not too sure what's going on here. – baxx Nov 21 '21 at 19:17

score 44 · Answer 3 · edited Jun 10 '21 at 14:59

44

There are several ways to UPDATE using sqlalchemy

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query(Stuff).update({"foo": Stuff.foo + 1})
   session.commit()

3) conn = engine.connect()
   table = Stuff.__table__
   stmt = table.update().values({'foo': Stuff.foo + 'a'})
   conn.execute(stmt)
   conn.commit()

edited Jun 10 '21 at 14:59

Mark Mishyn

3,921
2
28
30

answered Nov 10 '15 at 19:53

Nima Soroush

12,242
4
52
53

score 12 · Answer 4 · answered Sep 08 '15 at 00:10

Here's an example of how to solve the same problem without having to map the fields manually:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

So to update a Media instance, you can do something like this:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

score 2 · Answer 5 · answered Nov 07 '08 at 01:08

2

If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.

If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).

answered Nov 07 '08 at 01:08

Matthew Schinckel

35,041
6
86
121

Nah, the table's all on its own. I've never used an ORM before -- is this just something they're bad at? – John Fouhy Nov 07 '08 at 01:17
1

There is an overhead due to creating Objects, but in my opinion it is worth the penalty - being able to persistently store objects in a database is awesome. – Matthew Schinckel Nov 18 '08 at 11:42

Matthew Schinckel · Answer 6 · 2008-11-07T01:06:24.567

1

Withough testing, I'd try:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC, commit() works without flush()).

I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.

[Please note comment below - this did not speed things up at all].

edited Nov 07 '08 at 01:06

answered Nov 07 '08 at 00:35

Matthew Schinckel

35,041
6
86
121

2

Adding .all() and removing .flush() didn't change the time at all. – John Fouhy Nov 07 '08 at 01:03

Efficiently updating database using SQLAlchemy ORM

6 Answers6

Linked

Related