2

I have a messaging app where I need to return all of the conversations that a user is part of and the messages associated with each one. I'd like to limit the number of messages per conversation.

Table structure is as follows:

Users

| id   | name | email    | created_at |
|------|------|----------|------------|
| 1    | Bob  | a@b.com  | timestamp  |
| 2    | Tom  | b@b.com  | timestamp  |
| 3    | Mary | c@b.com  | timestamp  |

Messages

| id   | sender_id | conversation_id  | message | created_at |
|------|-----------|------------------|---------|------------|
| 1    | 1         | 1                | text    | timestamp  |
| 2    | 2         | 2                | text    | timestamp  |
| 3    | 2         | 1                | text    | timestamp  |
| 4    | 3         | 3                | text    | timestamp  |

Conversations

| id | created_at |
|----|------------|
| 1  | timestamp  |
| 2  | timestamp  |
| 3  | timestamp  |

Conversations_Users

| id | user_id | conversation_id |
|----|---------|-----------------|
| 1  | 1       | 1               |
| 2  | 2       | 1               |
| 3  | 2       | 2               |
| 3  | 3       | 2               |
| 4  | 3       | 3               |
| 5  | 1       | 3               |

I want to load up all the conversations that user (id 1) is in (In the case of the example - conversation 1 and 3). For each conversation I need the messages associated with it, grouped by conversation_id, ordered by created_at ASC. My current query handles this:

SELECT
    *
FROM
    messages
WHERE
    conversation_id IN (
        SELECT
            conversation_id
        FROM
            conversations_users
        WHERE
            user_id = 1
    )
ORDER BY
    conversation_id, created_at ASC;

However, that would stick a lot of data into memory. Therefore, I'd like to limit the number of messages per conversation.

I've looked at rank() and ROW_NUMBER() but am unsure how to implement them/if they are what's needed.

GMB
  • 216,147
  • 25
  • 84
  • 135
ryanpback
  • 275
  • 4
  • 17

3 Answers3

2

You can indeed use row_number(). The following query will give you the last 10 messages per conversation of the given user:

select *
from (
    select 
        m.*, 
        row_number() over(
            partition by cu.user_id, m.conversation_id 
            order by m.created_at desc
        ) rn
    from messages m
    inner join conversations_users cu 
        on  cu.conversation_id  = m.conversation_id 
        and cu.user_id = 1
) t
where rn <= 10
order by conversation_id, created_at desc

Notes:

  • I turned the subquery with in to a regular join, since I believe that it is a neater way to express your requirement

  • I added the id of the user to the partitioning clause; so, if you remove the where clause that filters on the user, you get the 10 last messages of the conversations of each and every user

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you very much. This worked exactly as intended and I appreciate the extra step you took. @GMB could you explain what the pieces are that it's going. I love the answer, but for learning, it would be great to have an understanding of _why_ this works. Thank you if you're able to provide that info! – ryanpback Nov 26 '19 at 05:06
1

this is a sample to limit conversations per 100 users using row_number(). in descending order to get the latest conversations.

select * from 
messages t1
inner join(
    select row_number() over (partition by user_id order by conversation_id desc) rn, conversation_id, user_id
    from conversations_users) t2 on t1.user_id = t2.user_id
where rn <= 100
order by created_at asc;
Ed Bangga
  • 12,879
  • 4
  • 16
  • 30
1

You can use ROW_NUMBER() to limit the messages per conversation. To get the most recent :

SELECT m.*
FROM (SELECT m.*,
             ROW_NUMBER() OVER (PARTITION BY m.conversation_id ORDER BY m.created_at DESC) as seqnum
      FROM messages m
     ) m JOIN
     conversation_users cu
     ON m.conversation_id = cu.conversation_id
WHERE cu.user_id = 1 AND seqnum <= <n>
ORDER BY m.conversation_id, m.created_at ASC;

An alternative method uses a lateral join:

select m.*
from conversation_users cu cross join lateral
     (select m.*
      from messages m
      where m.conversation_id = cu.conversation_id
      order by m.created_at desc
      limit <n>
     ) m
where cu.user_id = 1
order by m.message_id, m.created_at;

I think this might have better performance on larger data, but you would need to test that.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786