4

I am using sqlalchemy core with a postgres database, and am implementing the JSON:API spec.

For pagination I am just using basic LIMIT and OFFSET. So for example with the following two tables:

session
id
name

appearance
id
date
session_id

For session I can simply paginate it like:

SELECT id, name FROM session LIMIT 20 OFFSET 40

Our database can only grow so large due to limits on how long the data can be kept, so I am not worried about the slowdowns the LIMIT OFFSET method can create. In the above example it works fine, the number of records is always equal to the limit, or less if at the end. But my problem comes when I have:

SELECT s.id, s.name, a.date FROM session s JOIN appearance a on s.id = a.session_id LIMIT 3 OFFSET 0

Because there may be say two appearance rows referencing the session then I may end up with something like:

s.id     s.name                 a.date
1        FirstSesh              24/04/14
1        FirstSesh              01/01/20
2        Hello                  09/09/10

Now I have only returned one row and potentially another or just a part of another.

The first solution I thought of was doing:

SELECT s.id, s.name, a.date FROM (SELECT id, name FROM session LIMIT 3 OFFSET 0) s JOIN appearance a on s.id = a.session_id

But that now limits by ability to use WHERE or ORDER on s since it will be limited to three. And I can't just put all those conditions inside of the sub-query, because of the way I've built up the relationships part of JSON:API, and because I want to restrict whether I return a session also based on if e.g. the appearances.date are before say 2012. Which would cause the same issue.

For reference each type in my JSON:API setup has its own query, and then when used in relationships those queries are used as subqueries, which allows simple recursive relationships and simple implementation of new relationships.

If I could instead do something like use LIMIT and OFFSET based on groups of say the session id's then I think that might work? But I'm not sure how I'd do that?

GMB
  • 216,147
  • 25
  • 84
  • 135
Levi H
  • 3,426
  • 7
  • 30
  • 43

1 Answers1

2

You can use window functions. For example, this gives you the first 3 sessions (ordered by id), along with all the corresponding appearances (no matter how many of them match).

SELECT s.id, s.name, a.date 
FROM (SELECT s.*, ROW_NUMBER() OVER(ORDER BY id) rn FROM sessions) s 
INNER JOIN appearance a ON s.id = a.session_id 
WHERE s.rn BETWEEN 0 AND 3
ORDER BY s.rn, a.date

You can then "paginate" through the resultset by changing the bounds of the BETWEEN condition.

Edit

Alternatively:

SELECT id, name, date
FROM (
    SELECT s.id, s.name, a.date,
        DENSE_RANK () OVER(ORDER BY id) rn
    FROM sessions s 
    INNER JOIN appearance a ON s.id = a.session_id 
    WHERE a.is_admin = 1
) c
WHERE s.rn BETWEEN 0 AND 3
ORDER BY rn, date
GMB
  • 216,147
  • 25
  • 84
  • 135
  • This has the same problem? E.g. let's imagine appearance has another field, is_admin. If I add onto that query `WHERE is_admin is True`, then it only applies that to the three already selected sessions. So I end up getting no results if those frst three session's didn't have an appearance where is_admin is True, or if for example the third, fourt, fifth, and sixt sessions have an appearance with is_admin, in your version here I only get the one row, the third one. Instead of getting the third, fourth, and fifth. – Levi H Oct 02 '20 at 01:23