2

I want to do this but in SQLAlchemy. The only difference is that rather than only being able to get the most recent record, I want to be able to get the most recent record before a given timestamp. As long as I ensure rows are never deleted, this allows me to view the database as it was on a particular timestamp.

Let's say my model looks like this:

from datetime import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative include declarative_base
Base = declarative_base()
class User(Base):
    __tablename__ = "users"
    id_ = Column("id", Integer, primary_key=True, index=True, nullable=False)
    timestamp = Column(DateTime, primary_key=True, index=True, nullable=False, default=datetime.utcnow())
    # other non-primary attributes would go here

And I have this users table (timestamps simplified):

| id_ | timestamp |
-------------------
  0     1
  0     4
  0     6
  1     3
  2     7
  2     3

For example, if I request a snapshot at timestamp = 4, I want to get:

| id_ | timestamp |
-------------------
  0     4
  1     3
  2     3

The best I can come up with is doing it procedurally:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
db_engine = create_engine(...)
SessionLocal = sessionmaker(bind=db_engine, ...)
db_session = SessionLocal()

def get_snapshot(timestamp: datetime):
    all_versions = db_session.query(User).filter(User.timestamp <= timestamp).order_by(desc(User.timestamp))
    snapshot = []
    for v in all_versions:
        if v.id_ not in (i.id_ for i in snapshots):
            snapshot.append(v)
    return snapshot

However, this gives me a list of model objects rather than a sqlalchemy.orm.query.Query, so I have to treat the result differently to standard queries in other parts of my code. Can this be done all in the ORM?

Thanks in advance

microbug
  • 131
  • 1
  • 4

2 Answers2

1

Have you tried:

all_versions = db_session.query(User, func.max(User.timestamp)).\
               filter(User.timestamp <= timestamp).\
               group_by(User.id_)               

You can read more about generic functions in SQLAlchemy here

Matteo Di Napoli
  • 567
  • 5
  • 17
  • I've responded to your answer in another answer as I didn't have space here – microbug Jun 20 '19 at 13:16
  • Unfortunately that doesn't work completely. It returns the correct answer when `timestamp=datetime.utcnow()` (and there are no future entries), but when going backwards in time you lose entries completely rather than getting past versions of them. For example, if I apply your current solution to the example in the question, I would get a response of only `2, 3` (one row) and nothing else. Your old solution (in my answer below) that outputs `sqlalchemy.util._collections.result` objects would output `0, 4 | 1, 3 | 2, 3`. – microbug Jun 20 '19 at 15:40
  • I don't really understand how the model replacement operated by `with_entities` messes up the result. I will go back to the previous working solution. – Matteo Di Napoli Jun 20 '19 at 17:02
  • I've updated my answer to use your code as a subquery, allowing me to return a `sqlalchemy.orm.query.Query` object. – microbug Jun 20 '19 at 21:20
-1

An alternative to Matteo's solution is to use a subquery and join it to the table, which gives the result in my preferred format of a sqlalchemy.orm.query.Query object. Credit to Matteo for the code for the subquery:

subq = db_session.query(User.id_, func.max(User.timestamp).label("maxtimestamp")).filter(User.timestamp < timestamp).group_by(User.id_).subquery()
q = db_session.query(User).join(subq, and_(User.id_ == subq.c.id, User.timestamp == subq.c.maxtimestamp))

SQL generation

Note that this is probably less efficient than Matteo's solution:

SQL generated by subquery solution

SELECT users.id AS users_id, users.timestamp AS users_timestamp, users.name AS users_name, users.notes AS users_notes, users.active AS users_active
FROM users JOIN (SELECT users.id AS id, max(users.timestamp) AS maxtimestamp
FROM users
WHERE users.timestamp < ? GROUP BY users.id) AS anon_1 ON users.id = anon_1.id AND users.timestamp = anon_1.maxtimestamp

SQL generated by Matteo's solution:

SELECT users.id AS users_id, users.timestamp AS users_timestamp, users.name AS users_name, users.notes AS users_notes, users.active AS users_active, max(users.timestamp) AS max_1
FROM users
WHERE users.timestamp <= ? GROUP BY users.id

Previous content of this answer

@Matteo Di Napoli

Thanks, your post is more or less what I need. The output of this is an sqlalchemy.util._collections.result, which behaves like a tuple from what I can see. In my application I need the full User objects, not just id / timestamp pairs, so the better fit for me is:

from sqlalchemy import func 

all_versions = db_session.query(User, func.max(User.timestamp)).\
               filter(User.timestamp <= timestamp).\
               group_by(User.id_)

Returning something like:

> for i in all_versions: print(i)
...
(<User "my test user v2", id 0, modified 2019-06-19 14:42:16.380381>, datetime.datetime(2019, 6, 19, 14, 42, 16, 380381))
(<User "v2", id 1, modified 2019-06-19 15:53:53.147039>, datetime.datetime(2019, 6, 19, 15, 53, 53, 147039))
(<User "a user", id 2, modified 2019-06-20 12:34:56>, datetime.datetime(2019, 6, 20, 12, 34, 56))

I can then access the User objects with all_versions[n][0] or get a list with l = [i[0] for i in all_versions] (thanks to Matteo Di Napoli for the nicer syntax there).

The perfect end result would be if I could get a result that is still a sqlalchemy.orm.query.Query (like all_versions), but with each item a User object rather than a sqlalchemy.util._collections.result. Is that possible?

microbug
  • 131
  • 1
  • 4
  • Can you build your end result with `users = [result[0] for result in all_versions]`? – Matteo Di Napoli Jun 20 '19 at 13:25
  • @MatteoDiNapoli not quite. That returns a list, which is not the same as a `sqlalchemy.orm.query.Query`. It means that there are some small ways in which I must treat it differently to query objects elsewhere. For example, `len(all_versions)` won't work (replace with `all_versions.count()`) -- for a list it's the other way around. A minor gripe but it would be nice if I could avoid it. Of course I could convert all query results to list in my code elsewhere... – microbug Jun 20 '19 at 13:31
  • I've updated my answer with a new version that returns what you require – Matteo Di Napoli Jun 20 '19 at 13:59