Postgres limit number of rows for each in WHERE IN id from another table

Question

I have a messaging app where I need to return all of the conversations that a user is part of and the messages associated with each one. I'd like to limit the number of messages per conversation.

Table structure is as follows:

Users

| id   | name | email    | created_at |
|------|------|----------|------------|
| 1    | Bob  | a@b.com  | timestamp  |
| 2    | Tom  | b@b.com  | timestamp  |
| 3    | Mary | c@b.com  | timestamp  |

Messages

| id   | sender_id | conversation_id  | message | created_at |
|------|-----------|------------------|---------|------------|
| 1    | 1         | 1                | text    | timestamp  |
| 2    | 2         | 2                | text    | timestamp  |
| 3    | 2         | 1                | text    | timestamp  |
| 4    | 3         | 3                | text    | timestamp  |

Conversations

| id | created_at |
|----|------------|
| 1  | timestamp  |
| 2  | timestamp  |
| 3  | timestamp  |

Conversations_Users

| id | user_id | conversation_id |
|----|---------|-----------------|
| 1  | 1       | 1               |
| 2  | 2       | 1               |
| 3  | 2       | 2               |
| 3  | 3       | 2               |
| 4  | 3       | 3               |
| 5  | 1       | 3               |

I want to load up all the conversations that user (id 1) is in (In the case of the example - conversation 1 and 3). For each conversation I need the messages associated with it, grouped by conversation_id, ordered by created_at ASC. My current query handles this:

SELECT
    *
FROM
    messages
WHERE
    conversation_id IN (
        SELECT
            conversation_id
        FROM
            conversations_users
        WHERE
            user_id = 1
    )
ORDER BY
    conversation_id, created_at ASC;

However, that would stick a lot of data into memory. Therefore, I'd like to limit the number of messages per conversation.

I've looked at rank() and ROW_NUMBER() but am unsure how to implement them/if they are what's needed.

score 2 · Accepted Answer · answered Nov 26 '19 at 01:01

You can indeed use row_number(). The following query will give you the last 10 messages per conversation of the given user:

select *
from (
    select 
        m.*, 
        row_number() over(
            partition by cu.user_id, m.conversation_id 
            order by m.created_at desc
        ) rn
    from messages m
    inner join conversations_users cu 
        on  cu.conversation_id  = m.conversation_id 
        and cu.user_id = 1
) t
where rn <= 10
order by conversation_id, created_at desc

Notes:

I turned the subquery with in to a regular join, since I believe that it is a neater way to express your requirement
I added the id of the user to the partitioning clause; so, if you remove the where clause that filters on the user, you get the 10 last messages of the conversations of each and every user

Thank you very much. This worked exactly as intended and I appreciate the extra step you took. @GMB could you explain what the pieces are that it's going. I love the answer, but for learning, it would be great to have an understanding of _why_ this works. Thank you if you're able to provide that info! — ryanpback, Nov 26 '19 at 05:06

score 1 · Answer 2 · answered Nov 26 '19 at 01:00

this is a sample to limit conversations per 100 users using row_number(). in descending order to get the latest conversations.

select * from 
messages t1
inner join(
    select row_number() over (partition by user_id order by conversation_id desc) rn, conversation_id, user_id
    from conversations_users) t2 on t1.user_id = t2.user_id
where rn <= 100
order by created_at asc;

score 1 · Answer 3 · answered Nov 26 '19 at 01:03

You can use ROW_NUMBER() to limit the messages per conversation. To get the most recent :

SELECT m.*
FROM (SELECT m.*,
             ROW_NUMBER() OVER (PARTITION BY m.conversation_id ORDER BY m.created_at DESC) as seqnum
      FROM messages m
     ) m JOIN
     conversation_users cu
     ON m.conversation_id = cu.conversation_id
WHERE cu.user_id = 1 AND seqnum <= <n>
ORDER BY m.conversation_id, m.created_at ASC;

An alternative method uses a lateral join:

select m.*
from conversation_users cu cross join lateral
     (select m.*
      from messages m
      where m.conversation_id = cu.conversation_id
      order by m.created_at desc
      limit <n>
     ) m
where cu.user_id = 1
order by m.message_id, m.created_at;

I think this might have better performance on larger data, but you would need to test that.

Postgres limit number of rows for each in WHERE IN id from another table

3 Answers3