1

In an app I'm working in, I've got a Comments table in my Postgres DB that which can be simplified to something like this:

+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id |     created_at      |
+----+-----------+-----------------+---------------------+
| 1  | NULL      | 1               | 2023-08-01 12:00:00 |
| 2  | NULL      | 2               | 2023-08-01 12:00:01 |
| 3  | NULL      | 2               | 2023-08-01 12:00:02 |
| 4  | 3         | 1               | 2023-08-01 12:00:03 |
| 5  | 2         | 1               | 2023-08-01 12:00:04 |
| 6  | 1         | 1               | 2023-08-01 12:00:05 |
| 7  | 2         | 2               | 2023-08-01 12:00:06 |
| 8  | 2         | 1               | 2023-08-01 12:00:07 |
+----+-----------+-----------------+---------------------+

The parent_id column, when populated, refers to another row in the comment table. This sets up a parent/child relationship. In other words, a comment can have other comments that are replies. Any parent comment can have many replies. For the purposes of this app, the comment parent/child relationship is only one level deep.

There's also a Group Members table that is referenced by the comments table via a foreign key. This represents each comment's author, and can be reduced to this:

+----+---------------+
| id |     name      |
+----+---------------+
|  1 | Johnny Tables |
|  2 | Susan Select  |
+----+---------------+

For any given group member, I want to find the latest distinct root comments to which they've replied, in the order they've replied to them. For instance, Johnny Tables (id 1) would get:

+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id |     created_at      |
+----+-----------+-----------------+---------------------+
|  2 | NULL      |               2 | 2023-08-01 12:00:01 |
|  1 | NULL      |               1 | 2023-08-01 12:00:00 |
|  3 | NULL      |               2 | 2023-08-01 12:00:02 |
+----+-----------+-----------------+---------------------+

And Susan Select (id 2) would get:

+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id |     created_at      |
+----+-----------+-----------------+---------------------+
|  2 | NULL      |               2 | 2023-08-01 12:00:01 |
+----+-----------+-----------------+---------------------+

Here is a fiddle that sets up the table structure with sample data.

I've tried various incantations with subqueries and DISTINCT and ORDER BY, but I'm missing something. With this query, for example, I seem to get the correct replies, but they don't actually get ordered by the created_at date. They're ordered by ascending parent_ids

SELECT DISTINCT ON (parent_id)
    parent_id,
    created_at
FROM
    comments
WHERE
    comments.group_member_id = 1
    AND comments.parent_id IS NOT NULL
ORDER BY
    comments.parent_id,
    comments.created_at DESC

And once I have that, I don't quite know how to leverage that to get the root comments and keep them in the chronological order of the replies. What am I missing?

Note: While I'm asking for general SQL guidance here, since the app I'm working on is a Rails app, an Active Record or Arel interpretation would also be welcome.

EDIT: Added sample tables and expected results.

Michael Bester
  • 315
  • 4
  • 11
  • 1
    Please update your question with sample data and the expected/desired results of that data. Add these as formatted text (see [here](https://senseful.github.io/text-table/)) - **no images**. Further a [fiddle](https://dbfiddle.uk/) would be appreciated. – Belayer Aug 05 '23 at 22:41
  • you would need to use recursive cte – nbk Aug 05 '23 at 23:40
  • Please add a few rows of sample data and the expected result. – The Impaler Aug 06 '23 at 02:04
  • Thanks for the formatted sample data tip @Belayer, I've added that. – Michael Bester Aug 06 '23 at 11:28
  • @nbk Give that I've clarified the question with some sample data and the desired result, can you expand on how you might use recursive common table expressions to handle this? – Michael Bester Aug 06 '23 at 15:23

1 Answers1

1

Your fiddle data looks to have the parent_id and group_member_id columns transposed.

Since there is only a single-level in the root/child comment hierarchy, you have overcomplicated your query.

This is a simple group by with a join back into the root comment:

select gm.name, r.id, r.group_member_id, r.parent_id, r.created_at,
       max(c.created_at) as last_reply_at
  from comments c
       join group_members gm on gm.id = c.group_member_id
       join comments r on r.id = c.parent_id
 group by gm.name, r.id, r.group_member_id, r.parent_id, r.created_at
 order by gm.name, max(c.created_at) desc;

Updated fiddle

Mike Organek
  • 11,647
  • 3
  • 11
  • 26
  • Thank you so much, Mike! I still haven't fully wrapped my head around `group by` clauses, so this is extremely helpful. – Michael Bester Aug 06 '23 at 17:29
  • @MichaelBester If simplifying further helps you, then please try thinking about the query without the join back into `comments r`. The only reason I joined back to `comments r` is because it looked like you wanted the entire row for the root comment. The first step is "give me all the latest (by `created_at`) comments by `group_member_id` per `parent_id` from the `comments` table." You can then order by the latest `created_at`. The join back to `comments r` does not figure into the selection or ordering of rows--it's there only to get those root comment columns. – Mike Organek Aug 06 '23 at 19:32