Denormalization vs Parent Referencing vs MapReduce

Question

I have a highly normalized data model with me. Currently I'm using manual referencing by storing the _id and running sequential queries to fetch details from the deepest collection.

The referencing is one-way and the flow has around 5-6 collections. For one particular use case, I'm having to query down to the deepest collection by querying subsequent "_id" from the higher level collections. So technically I'm hitting the database every time I run a

db.collection_name.find(_id: ****).

My prime goal is to optimize the read without hugely affecting the atomicity of the other collections. I have read about de-normalization and it does not make sense to me because I want to keep an option for changing the cardinality down the line and hence want to maintain a separate collection altogether.

I was initially thinking of using MapReduce to do an aggregation from the back and have a collection primarily for the particular use-case. But well even that does not sound that good.

In a relational db, I would be breaking the query in sub-queries and performing a join to get the data sets that intersect from the initial results. Since mongodb does not support joins, I'm having a tough time figuring anything out.

Please help if you have faced anything like this earlier or have any idea how to resolve it.

score 2 · Accepted Answer · answered Jul 15 '15 at 08:43

Denormalize your data.

MongoDB does not do JOIN's - period.

There is no operation on the database which gets data from more than one collection. Not find(), not aggregate() and not MapReduce. When you need to puzzle your data together from more than one collection, there is no other way than doing it on the application layer. For that reason you should organize your data in a way that any common and performance-relevant query can be resolved by querying just a single collection.

In order to do that you might have to create redundancies and transitive dependencies. This is normal in MongoDB.

When this feels "dirty" to you, then you should either accept the fact that your performance will be sub-optimal or use a different kind of database, like a classic relational database or a graph database.

Denormalization vs Parent Referencing vs MapReduce

1 Answers1