3

we are using cassandra to store chat data. we have to maintain all the users conversations and sort them based on the messages received.

Conversation: Is a stream of messages between 2 or more users. Conversations does not have an endtime and are continuous.

We are maintaining a table which has "user to conversations" based on the conversation create time. But we would like to sort it based on the messages received.

Solutions

  • Solution -1:

I have looked at a similar question Cassandra: List 10 most recently modified records

which suggests to use materialized view. Is this approach fine for frequently updated records like messages.

  • Solution -2: Use another table to maintain user_recentconversations in a list. Update the list with when the messages are received/sent to/by the user.

Considering the usecase please suggest what would be better solution.

Community
  • 1
  • 1
  • Do you want to sort them chronologically "globally"? Or do you want to sort them conversation by conversation? – xmas79 Jan 07 '17 at 09:22
  • We would like to sort all the users conversations. Not required to sort globally. Similar to whatsapp conversations the latest converstaion should come onto the top as soon as there is a new message in the conversation. – srikanth vukkadapu Jan 09 '17 at 17:38
  • After my answer and your comment, I think your question is not clear. Moreover you are using Cassandra-2.1 but your proposed solution 1 suggests Materialized Views, which are available only on Cassandra 3.0. – xmas79 Jan 10 '17 at 14:04

1 Answers1

0

What you actually want is to sort all the messages globally. Indeed, sorting conversations by the most recent message means sort all the messages as well.

An easy way to solve your problem is to create another table where you store each arrived message for the purpose of ordering the conversations:

CREATE TABLE conversations_by_most_recent_message (
    conversation_id int,
    ts timestamp,
    message text,
    PRIMARY KEY (conversation_id, ts)
) WITH CLUSTERING ORDER BY (ts DESC);

Whenever a message arrives you store it there. In this table you will have all the messages pertaining each conversation reverse-ordered chronologically. Getting data from this table will help you to know the timestamp of the last message on per-conversation basis.

The second step you need to do is getting the most recent record from each partition, and this can be easily solved exploiting the PER PARTITION LIMIT clause introduced in Cassandra 3.6:

SELECT * FROM conversations_by_most_recent_message PER PARTITION LIMIT 1;

This will fetch only the first record from each partition, that is only the most recent message from each conversation.

The final step is to sort (by timestamp desc) the rows retrieved at application level.

xmas79
  • 5,060
  • 2
  • 14
  • 35
  • As we need to retrieve the conversations on a per user basic.This will not work as it will retrieve all the different partitions for all the users. Also we are using cassandra 2.1. – srikanth vukkadapu Jan 10 '17 at 12:38