1

Imagine I'm designing a multi-user blog and I have user, post, and comment tables with the obvious meanings. On the main page, I want to show the ten most recent posts along with all their related comments.

The naive approach would be to SELECT the ten most recent posts, probably JOINed with the users that wrote them. And then I can loop through them to SELECT the comments, again, probably JOINed with the users that wrote them. This would require 11 selects: 1 for the posts and 10 for their comments, hence the name of the famous anti-pattern: n+1 selects.

The usual advice for avoiding this anti-pattern is to use the IDs from the first query to fetch all related comments in a second query which may look something like this:

SELECT
    *
FROM
    comments
WHERE
    post_id IN (/* A comma separated list of post IDs returned from the first query */)

As long as that comma separated list is in reasonably short we managed to fetch all the data we need by issuing only two SELECT queries instead of eleven. Great.

But what if I only want the top three comments for each post? I didn't try but I can probably come up with some LEFT JOIN trickery to fetch the most recent posts along with their top three comments in a single query but I'm not sure it would be scalable. What if I want the top hundred comments which would exceed the join limit of 61 tables of a typical MySQL installation for instance?

What is the usual solution for this other than reverting to n+1 selects anti-pattern? What is the most efficient way to fetch items with a subset of items related to each one in this fairly typical scenario?

GMB
  • 216,147
  • 25
  • 84
  • 135
cyco130
  • 4,654
  • 25
  • 34
  • What is your rdbms? Sql Server, postgres, oracle? What version? – Juan Carlos Oropeza Sep 15 '19 at 19:55
  • @JuanCarlosOropeza I don't think it is really relevant here since it's a theoretical question and I'm mostly interested in a generic solution. But solutions for specific RDBMs are also welcome. – cyco130 Sep 15 '19 at 19:58

3 Answers3

4

It is usually a better option to run as few queries as possible, and then implement some application logic on top of it if needed. In your use case, I would build a query that returns both the most recent posts and the most recent associated comments, with proper ordering to make the application processing easier. Then your application can take care of displaying them.

Assuming that you use MySQL (since you mentionned it in your question), let's start with a query that gives you the 10 most recent posts:

SELECT * FROM posts ORDER BY post_date DESC LIMIT 10

Then you can join this with the corresponding comments:

SELECT
    p.*,
    c.*
FROM 
    (SELECT * FROM posts ORDER BY post_date DESC LIMIT 10) p
    INNER JOIN comments c ON c.post_id = p.id

Finally, let's set up a limit on the number of comments per posts. For this, you can use ROW_NUMBER() (available in MySQL 8.0) to rank the comments per post, and then filter only the a given number of comments. This gives you the 10 most recent posts along with each of their 3 most recents comments:

SELECT *
FROM (
    SELECT
        p.*,
        c.*,
        ROW_NUMBER() OVER(PARTITION BY p.post_id ORDER BY c.comment_date DESC) rn
    FROM 
        (SELECT * FROM posts ORDER BY post_date DESC LIMIT 10) p
        INNER JOIN comments c ON c.post_id = p.id
) x
WHERE rn <= 3
ORDER BY p.post_date DESC, c.comment_date DESC

Query results are ordered by post, then by comment date. So when your application fetches the resuts, you get 1 to 3 records per post, in sequence.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • 1
    Thanks a lot and +1. I accepted the other answer though because it suggests pretty much the same thing ant it was posted a few minutes earlier. – cyco130 Sep 16 '19 at 07:49
2

If you want the last 10 posts

 SELECT p.post_id
 FROM post p
 ORDER BY p.publish_date DESC
 LIMIT 10

Now if you want the comment of those posts:

SELECT c.comment_id, u.name
FROM comments c
JOIN users u
  on c.user_id = u.user_id
WHERE c.post_id IN ( SELECT p.post_id
                     FROM post p
                     ORDER BY p.publish_date DESC
                     LIMIT 10 )

Now for the last 3 comments is where rdbms version is important so you can use row_number or not:

SELECT *
FROM (
    SELECT c.comment_id, u.name, 
           row_number() over (partition by c.post_id order by c.comment_date DESC) as rn
    FROM comments c
    JOIN users u
      on c.user_id = u.user_id
    WHERE c.post_id IN ( SELECT p.post_id
                         FROM post p
                         ORDER BY p.publish_date DESC
                         LIMIT 10 )
    ) x
WHERE x.rn <= 3
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • Cool, I didn't know about `row_count`, it certainly does the trick and seems to be well supported in recent versions of most popular RDBMSs. – cyco130 Sep 16 '19 at 07:47
0

You can do this in one query:

select . . .   -- whatever columns you want here
from (select p.*
      from posts p
      order by <datecol> desc
      fetch first 10 rows only
     ) p join
     users u
     on p.user_id = u.user_id join
     comments c
     on c.post_id = p.post_id;

This returns the posts/users/comments in one table, mixing the columns. But it only requires one query.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This does not provide a solution to limit the number of comments per post. – GMB Sep 15 '19 at 20:42
  • @GMB . . . That is because the question says: " I want to show the ten most recent posts along with all their related comments." – Gordon Linoff Sep 15 '19 at 22:06
  • @GordonLinoff yes but only to build the question's premise. The actual question is at the bottom. – cyco130 Sep 16 '19 at 07:46