15

I have gone though this article and here is the schema I have got from it. This is helpful for my application for maintaining statuses of a user, but how can I extend this to maintain one to one chat archive and relations between users, relations mean people belong to specific group for me. I am new to this and need an approach for this.

Requirements :

  • I want to store messages between user-user in a table.
  • Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
  • I want to retrieve all the messages from different users to the user when user has requested.
  • And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.

This is what I have tried so far:

CREATE TABLE chatarchive (
   chat_id uuid PRIMARY KEY,
   username text,
   body text
)

CREATE TABLE chatseries (
username text,
    time timeuuid,
    chat_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)

CREATE TABLE chattimeline (
    to text,
username text,
    time timeuuid,
    chat_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)

Below is the schema that I currently have:

CREATE TABLE users (
   username text PRIMARY KEY,
   password text
)    

CREATE TABLE friends (
    username text,
    friend text,
    since timestamp,
    PRIMARY KEY (username, friend)
)

CREATE TABLE followers (
    username text,
    follower text,
    since timestamp,
    PRIMARY KEY (username, follower)
)

CREATE TABLE tweets (
    tweet_id uuid PRIMARY KEY,
    username text,
    body text
)
CREATE TABLE userline (
    username text,
    time timeuuid,
    tweet_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)

CREATE TABLE timeline (
    username text,
    time timeuuid,
    tweet_id uuid,
    PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
joscas
  • 7,474
  • 5
  • 39
  • 59
Exception
  • 8,111
  • 22
  • 85
  • 136
  • 1
    Please describe more specific: what is your concrete problem? And what have you tried so far? – mschenk74 Jun 12 '14 at 05:38
  • @mschenk74 Thanks for the reply. I will add what I have tried – Exception Jun 12 '14 at 05:45
  • @Exception I'd like to know whether my answer helped at all or covered the requirements. – Tamer Tas Jun 21 '14 at 22:23
  • @TamerTas Yes it helped, I would like to share the bounty to you guys. I will start another bounty on this after this and will give it to you. Please don't think your answer was not helpful, and the same answer was already posted in db admin site of stackoverflow by the original author. But I would like to give bounty for your true effort. I am very thankful for it. – Exception Jun 21 '14 at 23:10

2 Answers2

23

With C* you need to store data in the way you'll use it. So let's see how this would look like for this case:

  • I want to store messages between user-user in a table.
  • Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.

    CREATE TABLE chat_messages (
        message_id uuid,
        from_user text,
        to_user text,
        body text,
        class text,
        time timeuuid,
        PRIMARY KEY ((from_user, to_user), time)
    ) WITH CLUSTERING ORDER BY (time ASC);
    

This will allow you to retrieve a timeline of messages between two users. Note that a composite primary key is used so that wide rows are created for each pair of users.

SELECT * FROM chat_messages WHERE from_user = 'mike' AND to_user = 'john' ORDER BY time DESC ;

  • I want to retrieve all the messages from different users to the user when user has requested.

CREATE INDEX chat_messages_to_user ON chat_messages (to_user);

This allows you to do:

SELECT * FROM chat_messages WHERE to_user = 'john';
  • And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.

CREATE INDEX chat_messages_class ON chat_messages (class);

This will allow you to do:

SELECT * FROM chat_messages WHERE class = 'family';

Note that in this kind of database, DENORMALIZED DATA IS A GOOD PRACTICE. This means that using the name of the class again and again is not a bad practice.

Also note that I haven't used a 'chat_id' nor a 'chats' table. We could easily add this but I feel that your use case didn't require it as it has been put forward. In general, you cannot do joins in C*. So, using a chat id would imply two queries.

EDIT: Secondary indexes are inefficient. A materialised view will be a better implementation with C* 3.0

Alex Weitz
  • 3,199
  • 4
  • 34
  • 57
joscas
  • 7,474
  • 5
  • 39
  • 59
  • Both the answers helps me a lot. I would like to share the bounty to both of you – Exception Jun 21 '14 at 07:09
  • Could you please add a gist of adding a all queries together. It would be very helpful for me to be on right track and refer. Thanks for the help – Exception Jun 21 '14 at 07:11
  • @Exception, please see this example gist that you can use to create the keyspace: https://gist.github.com/joscas/65f5c5af1ed5b8c12f49 – joscas Jun 21 '14 at 10:41
  • This other gist add some messages to the chat: https://gist.github.com/joscas/3b951a59ef809747401d It's a shell script, so that you can execute locally. Also note the handling of the uuids which is not always obvious – joscas Jun 21 '14 at 11:23
  • Could you please add a solution where I can add user details and another table where a user can belong to multiple classes. – Exception Sep 20 '14 at 16:23
  • 2
    Wouldn't the index: CREATE INDEX chat_messages_to_user ON chat_messages (to_user); become more and more inefficient as the number of unique user-user chats, N, grows. On a distributed system like Cassandra, the data will be partitioned by (from_user, to_user), meaning the index could reference multiple partitions in different clusters. Cassandra would have to query from multiple partitions and composite your request. This is not an issue if N is small, but if N is large and very fragmented it could become inefficient. You would likely want to create an entirely new table and denormalize. – Alexander Wong Sep 11 '15 at 07:46
  • The secondary index should be avoided here. May be a case for a materialized view though after 3.0 – Chris Lohfink Oct 13 '15 at 20:47
9

There is a chat application created by Alan Chandler on github that has the features you request:

It uses a 2-phase authentication. First the user is validated in the forums and then, the user is validated on the chat database.

Here's the first validation part of the schema (schema located in inc/user.sql):

BEGIN;

CREATE TABLE users (
  uid integer primary key autoincrement NOT NULL,
  time bigint DEFAULT (strftime('%s','now')) NOT NULL,
  name character varying NOT NULL,
  role text NOT NULL DEFAULT 'R',      -- A (CEO), L (DIRECTOR), G (DEPT HEAD), H (SPONSOR) R(REGULAR)
  cap integer DEFAULT 0 NOT NULL,      -- 1 = blind, 2 = committee secretary, 4 = admin, 8 = mod, 16 = speaker 32 = can't whisper( OR of capabilities).
  password character varying NOT NULL, -- raw password
  rooms character varying,             -- a ":" separated list of rooms nos which define which rooms the user can go in
  isguest boolean DEFAULT 0 NOT NULL
);
CREATE INDEX userindex ON users(name);
-- Below here you can add the specific users for your set up in the form of INSERT Statements

-- This list is test users to cover the complete range of functions. Note names are converted to lowercase, so only put lowercase names in here
INSERT INTO users(uid,name,role,cap,password,rooms,isguest) VALUES
(1,'alice','A',4,'password','7',0),     -- CEO class user alice
(2,'bob','L',3,'password','8',0),       -- DIRECTOR class user bob 
(3,'carol','G',2,'password','7:8:9',0), -- DEPT HEAD class user carol



And here's the second validation part of the schema (schema located in data/chat.sql):

CREATE TABLE users (
  uid integer primary key NOT NULL,
  time bigint DEFAULT (strftime('%s','now')) NOT NULL,
  name character varying NOT NULL,
  role char(1) NOT NULL default 'R',
  rid integer NOT NULL default 0,
  mod char(1) NOT NULL default 'N',
  question character varying,
  private integer NOT NULL default 0,
  cap integer NOT NULL default 0,
  rooms character_varying 
);



The following is the schema of the chat rooms you can see the user classes and the examples of it:

CREATE TABLE rooms (
  rid integer primary key NOT NULL,
  name varchar(30) NOT NULL,
  type integer NOT NULL -- 0 = Open, 1 = meeting, 2 = guests can't speak, 3 moderated, 4 members(adult) only, 5 guests(child) only, 6 creaky door
) ;

INSERT INTO rooms (rid, name, type) VALUES 
(1, 'The Forum', 0),
(2, 'Operations Gallery', 2),  -- Guests Can't Speak
(3, 'Dungeon Club', 6),        -- creaky door
(4, 'Auditorium', 3),          -- Moderated Room
(5, 'Blue Room', 4),           -- Members Only (in Melinda's Backups this is Adults)
(6, 'Green Room', 5),          -- Guest Only (in Melinda's Backups this is Juveniles AKA Baby Backups)
(7, 'The Board Room', 1),      -- Various meeting rooms - need to be on users room list



The users have another table to indicate the participation of the conversation:

CREATE table wid_sequence ( value integer);
INSERT INTO wid_sequence (value) VALUES (1);

CREATE TABLE participant (
  uid integer NOT NULL REFERENCES users (uid) ON DELETE CASCADE ON UPDATE CASCADE,
  wid integer NOT NULL,
  primary key (uid,wid)
);



And the archives are recorded as follows:

CREATE TABLE chat_log (
  lid integer primary key,
  time bigint DEFAULT (strftime('%s','now')) NOT NULL,
  uid integer NOT NULL REFERENCES user (uid) ON DELETE CASCADE ON UPDATE CASCADE,
  name character varying NOT NULL,
  role char(1) NOT NULL,
  rid integer NOT NULL,
  type char(2) NOT NULL,
  text character varying
);

Edit: However this type of data modeling is not very suitable for Cassandra. Because, in Cassandra your data does not fit on one machine so joins are not available. So, in Cassandra denormalizing data is the practical choice. Check below for the denormalized version of chat_log table:

CREATE TABLE chat_log (
  lid uuid,
  time timestamp,
  sender text NOT NULL,
  receiver text NOT NULL,
  room text NOT NULL,
  sender_role varchar NOT NULL,
  receiver_role varchar NOT NULL,
  rid decimal NOT NULL,
  status varchar NOT NULL,
  message text,
  PRIMARY KEY (sender, receiver, room)
  -- PRIMARY KEY (sender, receiver) if you don't want the messages to be separated by the rooms
) WITH CLUSTERING ORDER BY (time ASC);

Now in order to retrieve data you'd use the following queries:

Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.

SELECT * FROM chat_log WHERE sender = 'bob' ORDER BY time ASC

I want to retrieve all the messages from different users to the user when user has requested.

SELECT * FROM chat_log WHERE receiver = 'alice' ORDER BY time ASC

I want to store and retrieve class of users.

SELECT * FROM chat_log WHERE sender_role = 'A' ORDER BY time ASC -- messages sent by CEOs

SELECT * FROM chat_log WHERE receiver_role = 'A' ORDER BY time ASC -- messages received by CEOs


After modeling the data. You'd need to create indexes for quick and efficient querying as follows:

  • For retrieving all messages from different users to the user efficiently

CREATE INDEX chat_log_uid ON chat_log (sender);
CREATE INDEX chat_log_uid ON chat_log (receiver);

  • For retrieving all messages from user classes efficiently

CREATE INDEX chat_log_class ON chat_log (sender_role);
CREATE INDEX chat_log_class ON chat_log (receiver_role);


I believe these examples will give you the approach you need.

If you'd like to learn more about Cassandra data modeling you can check down below:

Tamer Tas
  • 3,288
  • 13
  • 22
  • 1
    I wouldn't expect that code pasted from a github repo to fit fully the OPs scenario. Besides, what would be the CQL queries needed by the App? – joscas Jun 21 '14 at 00:12
  • @joscas the OP looks for a direction to go and `github` happens to have a very good example. `CQL` queries are trivial after having a good schema. But, you are correct I'll update my answer with `CQL` queries – Tamer Tas Jun 21 '14 at 00:21
  • @TamerTas Both the answers helps me a lot. I would like to share the bounty to both of you – Exception Jun 21 '14 at 07:08
  • @Exception http://meta.stackexchange.com/questions/93385/can-bounties-be-divided-amongst-answers – Tamer Tas Jun 21 '14 at 09:30
  • 4
    @TamerTas, your example doesn't work. You have pasted SQL (not CQL) from an application that doesn't use Cassandra at all. If you try to input that code in cqlsh it errors. However, I respect the OPs point of view that your example can give an idea of how a generic chat application works. – joscas Jun 21 '14 at 10:50
  • @joscas Schema creation process is similar on all types. There are some difference like normalization in `CQL` and data type syntax. So, showing an example chat application should be helpful `CQL` or not. I included `CQL` schema as well. You are downvoting me because I tried to teach how to model instead of writing a `CQL` schema. – Tamer Tas Jun 21 '14 at 11:37
  • 2
    The differences in data modelling are huge in a relational DB versus an eventually consistent DB as Cassandra. One can't be used as a base for the other. As I said in my response, for Cassandra you need to organize data in the way you query whereas in a relational DB you aim for data normalization. This leads to very different models. – joscas Jun 21 '14 at 11:49
  • @joscas Like I said. Data needs to be denormalized in `CQL`. But that's not a huge difference and even trivial. I was more focused on how to model a chat application generally. I also included the `CQL` schema – Tamer Tas Jun 21 '14 at 11:54
  • TamerTas and @Joscas Your both answers are helpful for me. For now I would like to give bounty to Joscas and after this again I will start another bounty and will give it to Tamar Tas. I stand on my words and you can check my profile to check whether I stand on my words or not. – Exception Jun 21 '14 at 23:13
  • @TamerTas I started bounty, will reward it to you :). Can be done only after 23 hours. – Exception Jun 21 '14 at 23:18