15

I'm doing a relationship with many to many with mongoDb and mongoose.js, i know that there is many options, my situation is this:

I've two documents, user and projects, one user can have many projects and one project can have many user, so in my case i've 4 options:

1 - An array of id_user inside project document.

2 - An array of id_project inside user document.

3 - An array of id_user inside project document && An array of id_project inside user document.

4 - A third table mapping user and project relationship(like a relational database).

The option 1 and 2 are unavailable, because, imagine in the scenario of the option 1 if i wanted to find all projects from the user, i will have to look for this user id inside every project document array of the users(traverse this array in every project), this definitely isn't a good approach.

The option 3 is good but i will have to make some kind of transaction to ensure that both documents will be written, it's not that bad, because both documents will be much more read than written

The option 4 is simpler because when i add one user to a project, it's just to add a new document with both id's(it's good solution i think, because i will don't need to care about transaction, it's a good solution?)

So, what's the best solution?

Community
  • 1
  • 1
Rodrigo Fonseca
  • 944
  • 7
  • 21

2 Answers2

9

To the contrary, solution 1 and 2 are your best bet. Solution 3 can be considered when the update/creation frequency is very less compared to read frequency of projects and users as even though to update/create, it requires two queries, the ease of reading will be make up for that.

To choose among solution 1 and 2, you need to consider the read frequencies. Will you need projects of a user or uses of a project more frequently and choose according to that. If you feel both are of relatively the same frequency, it is better to keep the user object as less clustered as possible. Whatever option you choose, consider keeping an index on the array storing the _ids (of projects or users).

For ex.

userSchema = new Schema(
            {//otherstuff
               project_ids: [{type: Schema.Types.ObjectId, ref: 'Project'}})
              ...
            }) 
userSchema.index({'project_ids':1})

or

projectSchema = new Schema(
            {//otherstuff
               user_ids: [{type: Schema.Types.ObjectId, ref: 'User'}})
              ...
            }) 
projectSchema.index({'user_ids':1})

Keeping an index on the array of _id will vastly improve your queries' speed on the side where you fear there will be significant overhead.

But keep the index only if this relation is an important relation with a lot of queries going on. If this is just a side feature of your project, you can do without an index too.

If the user can do lots of stuff and has lots of relations, you will be needing that user object constantly throughout your app, so if your app isn't project specific, it would be better to not put the project ids in the user schema. But as we are just putting the ids, it isn't much of an overhead anyway. No need to worry about that.

Reg index on both the arrays: Yes you can ofcourse. But when you go for solution 3, you don't need an index at all as you won't be doing a query to get the list of projects of a user or the list of users in a project. Solution 3 makes reading very easy but writing a bit cumbersome. But as you mentioned that your use case involves reading>>writing, go with solution 3 but there's always a danger of data inconsistency which you need to take care of.

Indexing just makes things faster. Go through the docs and do a bit of googling. Nothing fancy. Querying over indexed arrays is efficient than normal arrays. For ex. Let us assume you use solution 2. Store the project ids in the project_ids field.

You can get the projects of a user easily. This is straight forward.

But to get users of project1. You need to a query like this.

User.find({project_ids:project._id},function(err,docs){
     //here docs will be the list of the users of project1
})
//The above query might be slow if the user base is large. 
//But it can be improved vastly by indexing the project_ids field in the User schema.

Similary for solution 1. Each project has user_ids field.Let us assume we have a user1. To get the projects of user we do the folowing query

Project.find({user_ids:user1._id},function(err,docs){
      //here docs will be the projects of user1
      //But it can be improved vastly by indexing the user_ids field in the Project schema.

If you are pondering over solution 1 vs solution 2, solution 1 is better I guess. There might be cases where you need user without his projects but the chances of requiring the project without users is pretty low. But it depends on your exact use case.

ma08
  • 3,654
  • 3
  • 23
  • 36
  • "it is better to keep the user object as less clustered as possible" What do you mean with it? And can i use this index in both schemas(projects and users)? – Rodrigo Fonseca Aug 03 '14 at 13:30
  • and solution 1 and 2 aren't good because both ways will be read almost in the same quantity, but they will be written in a very less frequency – Rodrigo Fonseca Aug 03 '14 at 13:34
  • Solution 3, i was thinking about make a transaction, to ensure that both documents are populate with data, like this: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ – Rodrigo Fonseca Aug 03 '14 at 14:00
  • 1
    indexing basically arranges the documents such that the querying on the indexed field is super fast. The 1 indicates ascending index, you can have an descending index by using -1. It doesn't matter really here, but it does incase of numbers and dates. Exactly. Go through the pitfalls of the the transaction(failure scenario i.e inconsistensy) and assess your options. – ma08 Aug 03 '14 at 14:04
  • yes, solution 1 is what i've in mind to use, the problem is, i know that index is faster and efficient like a normal array, the problem is, if i want to look a project for a respective user, i will have to look all projects documents, and inside every project document, traverse user_id array to check, the problem is this, do you understand? But i think that this solution is better for my case. – Rodrigo Fonseca Aug 03 '14 at 14:41
  • @RodrigoFonseca No, you don't do it yourself. Mongo does it for you. The code I have given is essentially the same. Just reverse the user and project. What the query I have given actually does is it checks whether the project1._id is present in the project_ids array of the user and returns all the corresponding users. Just reverse that to get it working for solution1. You don't have to manually traverse yourself, mongo does it for you. Check the edit. – ma08 Aug 03 '14 at 14:55
  • yes i know that mongo you do it for me, but i think that what mongo will do is almost the same thing, it's not magic... Mongo will have to look for all documents and inside every document, mongo will have to traverse the id array to check. I understand your example, thanks again! – Rodrigo Fonseca Aug 03 '14 at 15:14
  • That's why I was shouting INDEXING. Indexing makes this process far more efficient. It stores data so that it doesn't need to go through all the documents. It uses that index to traverse. – ma08 Aug 03 '14 at 15:16
  • yes! i will use index too, i was referring about not using index, but i understood. – Rodrigo Fonseca Aug 03 '14 at 15:19
  • @RodrigoFonseca yes I agree, without any index, that query would be lame, especially for large data sets. – ma08 Aug 03 '14 at 15:33
2

Well solution 1 and 2 are not looking so bad! If you index array of ObjectIds, you can then directly access to what you want.

Solution 3 is looking good too, but 4, not really, you need more queries, except when you have lots of changes between relations of projects and users, but not them itself.

I prefer solution 1 with index on array. I guess lots of times you need to get project object depend on user or directly by id, so .find() will do all you want. I think it's not bad to keep user schema with minimum informations, they looks like isolate objects and you may need them for other purposes.

Foad Nosrati Habibi
  • 1,134
  • 1
  • 8
  • 13
  • ok solution one is good, but if i want to find all projects from a respective User? I will have to traverse user array id object in all projects documents? That's not a good solution i think, and can you give me a brief explanation about how this index works? – Rodrigo Fonseca Aug 03 '14 at 13:38
  • 1
    Take a look at [this](http://docs.mongodb.org/manual/core/index-multikey/). All you need to get those users is `.find({user_ids:YOURID})` and to be sure it is so fast, run it in mongo shell like this: `db.collection.find({user_ids:YOURID}).explain()` and see how long it takes. – Foad Nosrati Habibi Aug 03 '14 at 15:52
  • Multikey indexes? i will use this: [link](http://docs.mongodb.org/manual/indexes/) what's the difference? – Rodrigo Fonseca Aug 03 '14 at 16:32
  • 1
    Multikey indexes is the type of indexing used for arrays. Mongoose will detect type of indexing itself. So don't worry about it. All you did is fine. – Foad Nosrati Habibi Aug 05 '14 at 04:46