8

i am going to design a group chat application based on mongodb, there are two schema design choices, one is designed as one document for one group chat message, another is designed as one document for all group messages.

In the first option, it can be shown as

var ChatMessageSchema = new Schema({
  fromUserId: ObjectId,
  toTroupeId: ObjectId, 
  text: String,
  sent: Date
}

in the second option, it can be shown as

var ChatMessageSchema = new Schema({ 
  toTroupeId: ObjectId, 
  chats:[
     fromUserId: ObjectId,
     text: String,
     sent: Date
  ]
}

Both design has pros and cons, the drawback of the second option is it can hardly index on the user and search the messages from users, and also too many group message might force to create more then one documents.

The first option seems to be more reasonable since it can allow to search the message based on groupid or userid if we can index properly.

but I wonder as there are hundreds of thousands messages in the group, meaning there will be corresponding hundreds of thousands documents in one group, does this will affect the database performance?

any idea on these design choices, is the first option as the optimal one, or how to optimise it?

user824624
  • 7,077
  • 27
  • 106
  • 183

2 Answers2

1

I would suggest a third option; creating a new collection for every group, e.g. room_$groupid. In such a collection, you could insert every message separately. This would give you the benefit of getting a full chatroom without a filter. You could simply return the last 200 or so messages from the collection.

It would allow for easier scalability, cause you won't end up with a single massive collection that you would have to filter through.

However, you would have to write the logic for selecting the right collection but should be a fairly trivial task. The downside would be that it would be near impossible to do a text search over multiple groups without throwing performance out of the window.

Collection limit*

Skami
  • 1,506
  • 1
  • 18
  • 29
1

MongoDB is made to handle huge amounts of data and their PDF Performance Best Practices for MongoDB states:

Avoir large documents

Which is also made clear by the 16MB limit.

So one can argue that MongoDB is specifically designed to handle hundreds of thousand of documents corresponding to your first schema.

Simply reduce the number of indexes to what you need (do you really need to query by users that often or could accept that query to be a lot slower ?) and you should be fine with your first schema. Actually I'm not sure there is any benefit with the second one.

Er...
  • 526
  • 4
  • 10