0

I am new to Couchbase, I would like to understand how to model storing billions of chat messages originating from a typical IM app in Couchbase. What would be the correct way to model this in Couchbase? Assume 10000 new messages/sec inserts and 40000 updates on these 10000 messages/sec. Assume, one to one chat as the primary use case, although each person would have many buddies - pretty much like Whatsapp

Thanks, appreciate all feedback.

**Update: **

Thanks for your reply, here is my database design:

enter image description here

Sample data store on Couchbase (document store):

document User:

123_user => {"id" : 123, "friend_ids" : [456, 789, ...], "session": "123asdcas123123qsd"}

document History Message (channel_name = userId1 + "-to-" + userId2)

123-to-456_history => {"channel_name": "123-to-456", "message_ids" => ["545_message, 999_message, ...."]}

document Message:

545_message => {"id" : 545, client_id : 4143413, from_uid : 123, "to_uid" : 456, "body" : "Hello world", "create_time" : 1243124124, "state" : 1}

there is problem here, when message_ids field on History Message store million or a billion message ids, this is really a big problem when reading and writing messages history. Can anyone give me a solution to this problem?

Community
  • 1
  • 1
Nam Vạc
  • 119
  • 2
  • 10
  • So what are your typical operations? Insert a new message to a chat room? Get a range of messages for a given chat room ID? – Adi Levin Jan 18 '16 at 11:41
  • This highly depends on the data flow of your application and backend processes. As Adi asked, could you please describe what your data access patterns look/will look like? Do you have latency constraints in addition to the throughput you described? What are the most common data operations? – David Ostrovsky Jan 18 '16 at 16:40
  • Thanks for your reply, I just described further questions on the side – Nam Vạc Jan 20 '16 at 03:33

1 Answers1

0

First of all, we need to put CouchBase aside. The key problem is how to model this application scenario, then we know if CouchBase is your best choice.

A one-to-one chat application can use each pair of chatters as a primary key.

For example, Bob-to-Jack, they chat:
1."hello!";
2."go for rest?";
3."no, i'm busy now.";
...

You will insert a new record with primary key "Bob-Jack", and value "hello; go for rest; no,....".

If the conversation stops, this record will stop growing and stored for future use.

If on the next day, the two guys chat again, your application will fetch out this record by key "Bob-Jack", display their yesterday conversation(the value), and update the value by appending new chat content to the end.

The length of the value grows, if it exceeds some threshold, you will split it into two records. As many DB systems have a size limitation for one record.

One guy has many buddies, so there are billions of pairs(keys) in real world, with each pair a long conversation(value). No-sql solutions are good choice for this data volume.

Then you may know if CouchBase is capable of this kind of task. I think it's OK but not the only-one choice.

  • Thanks for your reply, I just described further questions on the side. – Nam Vạc Jan 20 '16 at 03:27
  • @Nam Vạc, Following your updated content, i noticed that the model tries to give each short conversation a message ID, then in the case of 123-to-456 chat, it may consume tens of messages IDs, the total database will easily has billions of records. It's not efficient and not productive. An alternative way to store messages is to combine tens(hundreds) of messages of 123-to-456 together, i.e, they share one message ID and are stored together for lesser disk accesses. This will greatly reduce the database size, meaning a quick search, since billions of messages downsizes to tens of millions. – neo.carmack Jan 20 '16 at 11:20
  • they share one message ID and are stored together for lesser disk accesses. Can you tell me more your idea? thanks you so much. – Nam Vạc Jan 20 '16 at 17:30
  • Can you tell me more and for a model or example in more detail? – Nam Vạc Jan 20 '16 at 17:22