0

Let's say we have a chat room. Each chat room has several participants. In my case, it's one client and several operators (1 or more). Also, each chat room has its own specific theme. As we know in the Cassandra database, tables need to be designed based on queries.

Let's say I have such queries:

  1. Get a list of all open chat rooms that don't have operators yet.
  2. Get a list of all open chats that have a specific operator.
  3. Get a list of all open chat rooms on a specific theme that don't have operators yet.
  4. Get a list of all open chats on a specific theme that have a specific operator.

I know that denormalization is normal for Cassandra. But still, do I need to create separate tables for all these queries? I would also like to know how to check whether a value is included in the list in a SELECT query? As you can see, the operators column in my case has the list data type.

At this particular moment, for the first two queries, I created the following table:

CREATE TABLE IF NOT EXISTS "chat_rooms" (
    room_id uuid,
    created_at timestamp,
    updated_at timestamp,
    client varchar,
    operators frozen list<varchar>,
    last_message text,
    unread_message_number int,
    is_open boolean,
    theme varchar,
    PRIMARY KEY (
        (is_open, operators),
        updated_at
    )
) WITH CLUSTERING ORDER BY (updated_at desc);

For the third and fourth queries, I just changed the partition key and added the theme column. How correct is this?

Nurzhan Nogerbek
  • 4,806
  • 16
  • 87
  • 193
  • You have to provide all columns of partition key to find the correct partition in Cassandra. So you have to design tables according to your queries and if that require creating multiple tables then you have to do it. – Manish Khandelwal Sep 21 '20 at 05:05
  • Yes, you're right. As you can see, I use the `operators` column as the `partition key`. The data type of this column is the `list`, which is `mutable`. I know that it is not good practice to use a `mutable` data type as an element of the `primary key`. As I said before, each chat room has one client and several operators (1 or more). Let's say I want to create a `SELECT` query that returns a list of all private chat rooms of a certain operator. Something like `SELECT * FROM "CHAT_ROOMS" WHERE STATUS = 'private" AND "Mark" in operators`. How do I do this correctly? Do you have any ideas? – Nurzhan Nogerbek Sep 21 '20 at 05:43
  • 1
    You have to design your table according to your query. In this case you can have a table (chat_room_by_operator). operator can be the partition key. status and chat_room_id can be your clustering key. So if you search "SELECT * from chat_room_by_opertaror where operator ='Mark' and status='private', you will get all private chat rooms operated by Mark. – Manish Khandelwal Sep 21 '20 at 06:02
  • @ManishKhandelwal thank you for your answer. Why you use the `status` column as `clustering key`? It seems like this column must be a part of the `partition key`. Mark can also have open chat rooms instead of private ones. Don't you think so? – Nurzhan Nogerbek Sep 21 '20 at 06:38
  • You can do that as well. Just gave an example. There are many ways of doing things. and you know your data model better. – Manish Khandelwal Sep 21 '20 at 07:13

0 Answers0