0

I have two related tables, User and UserDownload, now I want to filter out UserDownload which it's created_at is bigger than the created_at of the user plus one day, so the python code is :

result = db.session.query(UserDownload.uid).join(User, UserDownload.uid == User.id).filter(UserDownload.created_at >= User.created_at + timedelta(days=1)).all()

it's logic seems correct, but the result is weird, some result, the created time of user plus one day is little than the created_at of UserDownload, but some are not . and the raw sql I check the string of the query is :

SELECT user_downloads.uid AS user_downloads_uid \nFROM user_downloads JOIN users ON user_downloads.uid = users.id \nWHERE user_downloads.created_at >= users.created_at + :created_at_1

really don't know what :created_at_1 means.

for example, the result contains a such user_download(I replace UserDownload.uid with UserDownload, and query User by its uid):

user_download.created_at: datetime.datetime(2015, 12, 3, 8, 39, 56) user.created_at: datetime.datetime(2015, 12, 2, 11, 7, 14)

李东勇
  • 190
  • 1
  • 10

1 Answers1

1

Depending on the backend that you are using, you need your filter to generate sql like (for mysql):

...WHERE user_downloads.created_at >= DATE_ADD(users.created_at, INTERVAL 1 DAY)

SQLAlchemy doesn't convert arithmetic between a datetime object and an InstrumentedAttribute object into that DATE_ADD (or equivalent depending on backend) function. So where you filter by this:

UserDownload.created_at >= User.created_at + timedelta(days=1)

it gets converted to this:

...WHERE user_downloads.created_at >= users.created_at + :created_at_1

where it treats timedelta(days=1) as a literal value, and parameterises it. That is what the :created_at_1 is, a parameter that holds the place in the query for the timedelta object which will be passed along with the query to the server (as a side note, in MySQL that timedelta actually gets converted into a datetime object that is epoch + 1 day, as MySQL doesn't have a native INTERVAL type).

So to get the query to do what you want, you need to use the sqlalchemy.func object to generate the server side function that you need. Continuing with the MySQL example, an appropriate query might be:

from sqlalchemy import func, text

q = session.query(UserDownload.uid).\
    join(User, UserDownload.uid == User.id).\
    filter(
        UserDownload.created_at >=
        func.DATE_ADD(
            User.created_at,
            text('INTERVAL 1 DAY')
        )
    )

Which generates:

SELECT user_downloads.uid
FROM user_downloads INNER JOIN users ON user_downloads.uid = users.id
WHERE user_downloads.created_at >= DATE_ADD(users.created_at, INTERVAL 1 DAY)

I found this question helpful: Using DATEADD in sqlalchemy

From the comments

since mysql can handle timedelta(days=1) params, why the query I used fails.

OK, I'll try to go into more detail about your original query, but give me some latitude as I'm working this out as I go. Lets forget about the timedelta for a second and just see what the sql generated is. So this:

session.query(UserDownload.uid).join(User, UserDownload.uid == User.id).filter(UserDownload.created_at >= User.created_at)

generates this sql:

SELECT user_downloads.uid AS user_downloads_uid
FROM user_downloads INNER JOIN users ON user_downloads.uid = users.id
WHERE user_downloads.created_at >= users.created_at

Nothing hard to grok there. Now when we add back in the timedelta the sql generated is exactly the same except that we add a value on to users.created_at which is represented by the bind parameter created_at_1:

SELECT user_downloads.uid AS user_downloads_uid
FROM user_downloads INNER JOIN users ON user_downloads.uid = users.id
WHERE user_downloads.created_at >= users.created_at + %(created_at_1)s

So is the comparison executed first, or the addition? Running this query...

print(engine.execute(text("SELECT 3 >= 2 + 2")).fetchall())

... returns 0 (i.e. False), which proves that the addition (2 + 2) is resolved before the comparison (>=). So safe to assume that happens in your query too and from that, the value of created_at_1 is added to users.created_at prior to the comparison with user_downloads.created_at. When I execute your query, this is the parameter value passed to the server:

{'created_at_1': datetime.datetime(1970, 1, 2, 0, 0)}

So even though you add timedelta(days=1) to users.created_at in your filter, SQLAlchemy actually passes a datetime object equivalent to <epoch> + <timedelta>, or in this case: datetime(1970, 1, 1, 0, 0) + timedelta(days=1) or datetime(1970, 1, 2, 0, 0) (which is the value that you see in the parameter value dictionary above).

So what exactly is the value of user.created_at + datetime(1970, 1, 2, 0, 0)? I added a User instance into the database with created_at = datetime(1970, 1, 1, 0, 0):

session.add(User(id=1, created_at=datetime(1970, 1, 1, 0, 0)))
session.commit()

Then ran this query:

engine.execute(text("SELECT created_at + :a FROM users"), a=datetime(1970, 1, 2, 0, 0)).fetchall()

which returned:

[(19700101001970.0,)]

That is the value of user.created_at without any formatting, with the year portion of the datetime(1970, 1, 2, 0, 0) concatenated to the end. So that is what your query is comparing to user_download.created_at. For the sake of (relative) brevity, I'm not going to look into how the comparison of user_download.created_at and that value works, but hopefully I've demonstrated that the end result of your query is not comparing user_download.created_at with users.created_at plus 1 day.

when I use query like: User.query.filter(User.created_at > datetime.now() + timedelta(days=1)), it works fine.

Using that query, here is the generated sql:

SELECT users.id AS users_id, users.created_at AS users_created_at
FROM users
WHERE users.created_at > %(created_at_1)s

and the value passed on to the server is:

{'created_at_1': datetime.datetime(2018, 11, 21, 21, 38, 51, 670890)}

So you can see that the datetime + timedelta part is resolved before being passed to the database and as such, the database operation is a simple comparison of a DATETIME column to a datetime value.

SuperShoot
  • 9,880
  • 2
  • 38
  • 55
  • Thanks, It helps. But I still have a question, since mysql can handle timedelta(days=1) params, why the query I used fails. Btw, when I use query like: User.query.filter(User.created_at > datetime.now() + timedelta(days=1)), it works fine. Is it because you can't do the date add between python object and the column in database? – 李东勇 Nov 20 '18 at 09:58
  • I've added to the answer above to try and address your extra questions. – SuperShoot Nov 20 '18 at 11:46
  • btw, 19700101001970.0, means the original date plus 19min and 70 seconds to me. and when I migrate the original data to mongo, and query it by aggregate, it seems like the origin query filter out user_download bigger than user created_at plus about 20mins. – 李东勇 Nov 21 '18 at 01:55