1

I will use the table structure at This page

class Book(Base):  
    __tablename__ = "books"    
    id = Column(Integer,Sequence('book_seq'),primary_key=True) 
    name = Column(String(50))
    time_added = Column(DateTime, default=datetime.datetime.now)

Say, I have these book added to the table:

  1. day1, 1:00
  2. day1, 2:00
  3. day2, 1:00
  4. day2, 2:00
  5. day2, 3:00

How can I make a query that filter out the 1 and 3 rows which records the first book added at that day?

I have tried

Book.query.order_by(Book.date_added).\
    group_by(db.func.date(Book.date_added, "start of day"))

But it actually gives the last result of every day, which is wrong. And now I have no idea what I can do because the document and example of sqlalchemy are so limited.

Please help me to find out what the problem is or tell me the right query statement, thanks a lot.

Edit:
Thanks for the suggestions of clarifying.

The database backend I used is sqlite.

And for inputs and outputs of the problem:
The table books contains many books added on different time of different days (distinct time_added). And I want to filter the books that were the first ones being inserted in that days. Let the days be the groups, and select the book having earliest add time in the group.

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
Oliver.X
  • 13
  • 1
  • 4
  • You're looking for a [tag:greatest-n-per-group] query, which means this is more an SQL problem than SQLAlchemy problem (regarding documentation). – Ilja Everilä May 21 '18 at 04:39
  • Also please add what DB you're using, as such a query is a bit DB dependent, though there are generic solutions as well. Wouldn't you be looking for the 3rd row instead of 4th? Please edit your question so that there are clear inputs and expected outputs (compared to actual). What does it mean that it "gives the last result of everyday"? – Ilja Everilä May 21 '18 at 06:59
  • Yes, you are right, it is the 3rd row. And I used sqlite as my database. I had edited my problem. Thanks – Oliver.X May 22 '18 at 06:02

2 Answers2

2

This is a somewhat common problem in SQL and has its own tag: . In your case N = 1, which makes it a bit more simple in SQLite. One solution is to use an antijoin, or in other words select rows for which no row with a lesser timestamp exists. This can be realized using a left join:

book_alias = aliased(Book)
Book.query.\
    outerjoin(
        book_alias,
        and_(func.date(Book.time_added) == func.date(book_alias.time_added),
             Book.time_added > book_alias.time_added)).\
    filter(book_alias.id == None).\
    all()

Same thing, using an EXISTS subquery expression:

Book.query.\
    filter(not_(db.session.query(book_alias).
                filter(book_alias.time_added < Book.time_added,
                       func.date(book_alias.time_added) ==
                       func.date(Book.time_added)).
                exists())).\
    all()

Of course you could also use a subquery to find the minimum values in a group, and filter based on that:

sq = db.session.query(func.min(book_alias.time_added)).\
    filter(func.date(book_alias.time_added) == func.date(Book.time_added))

Book.query.\
    filter(Book.time_added == sq).\
    all()

The queries may perform differently depending on your data, so test what suits you best.

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
0

You can use rownumber() function with condition rownumber=1. Follow this example :Complex query (subqueries, window functions) with sqlalchemycomplex-query-subqueries-window-functions-with-sqlalchemy

mad_
  • 8,121
  • 2
  • 25
  • 40
  • Usually it's best to avoid simply linking to other answers as an "answer": https://meta.stackexchange.com/questions/8231/are-answers-that-just-contain-links-elsewhere-really-good-answers. Also if you feel that this question is answered by an existing answer, flag as dupe. – Ilja Everilä May 22 '18 at 05:16
  • Yes, row_number can solve my problem. But actually, I'm using sqlite (my problem, I should make it clear). Thanks a lot – Oliver.X May 22 '18 at 06:11