In an app I'm working in, I've got a Comments table in my Postgres DB that which can be simplified to something like this:
+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id | created_at |
+----+-----------+-----------------+---------------------+
| 1 | NULL | 1 | 2023-08-01 12:00:00 |
| 2 | NULL | 2 | 2023-08-01 12:00:01 |
| 3 | NULL | 2 | 2023-08-01 12:00:02 |
| 4 | 3 | 1 | 2023-08-01 12:00:03 |
| 5 | 2 | 1 | 2023-08-01 12:00:04 |
| 6 | 1 | 1 | 2023-08-01 12:00:05 |
| 7 | 2 | 2 | 2023-08-01 12:00:06 |
| 8 | 2 | 1 | 2023-08-01 12:00:07 |
+----+-----------+-----------------+---------------------+
The parent_id
column, when populated, refers to another row in the comment table. This sets up a parent/child relationship. In other words, a comment can have other comments that are replies. Any parent comment can have many replies. For the purposes of this app, the comment parent/child relationship is only one level deep.
There's also a Group Members table that is referenced by the comments table via a foreign key. This represents each comment's author, and can be reduced to this:
+----+---------------+
| id | name |
+----+---------------+
| 1 | Johnny Tables |
| 2 | Susan Select |
+----+---------------+
For any given group member, I want to find the latest distinct root comments to which they've replied, in the order they've replied to them. For instance, Johnny Tables
(id 1) would get:
+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id | created_at |
+----+-----------+-----------------+---------------------+
| 2 | NULL | 2 | 2023-08-01 12:00:01 |
| 1 | NULL | 1 | 2023-08-01 12:00:00 |
| 3 | NULL | 2 | 2023-08-01 12:00:02 |
+----+-----------+-----------------+---------------------+
And Susan Select
(id 2) would get:
+----+-----------+-----------------+---------------------+
| id | parent_id | group_member_id | created_at |
+----+-----------+-----------------+---------------------+
| 2 | NULL | 2 | 2023-08-01 12:00:01 |
+----+-----------+-----------------+---------------------+
Here is a fiddle that sets up the table structure with sample data.
I've tried various incantations with subqueries and DISTINCT
and ORDER BY
, but I'm missing something. With this query, for example, I seem to get the correct replies, but they don't actually get ordered by the created_at
date. They're ordered by ascending parent_id
s
SELECT DISTINCT ON (parent_id)
parent_id,
created_at
FROM
comments
WHERE
comments.group_member_id = 1
AND comments.parent_id IS NOT NULL
ORDER BY
comments.parent_id,
comments.created_at DESC
And once I have that, I don't quite know how to leverage that to get the root comments and keep them in the chronological order of the replies. What am I missing?
Note: While I'm asking for general SQL guidance here, since the app I'm working on is a Rails app, an Active Record or Arel interpretation would also be welcome.
EDIT: Added sample tables and expected results.