When to use map reduce over Aggregation Pipeline in MongoDB?

Question

While looking at documentation for map-reduce, I found that:

NOTE:

For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.

I did not understand much from it.

What are the use cases for using map-reduce over aggregation pipeline?
What flexibility does map-reduce provide?
How much delta is there in performance?

score 19 · Accepted Answer · edited May 23 '15 at 05:57

For one thing, Map/Reduce in MongoDB wasn't made for ad-hoc queries, there's considerable overhead to M/R. Even a very simple M/R operation on a small dataset can take in the hundreds of milliseconds because of that overhead.

I can't say much about the performance of M/R compared to the aggregation framework on large datasets in practice, but in theory, M/R operations on a large sharded database should be faster since the shards can run the operations largely in parallel.

As to the flexibility, since M/R actually runs javascript methods you have the full power of the language at your disposal. For example, let's say you wanted to group some data by the cosine of a field's value. Since there's neither a $cos operator in the aggregation framework, nor a meaningful way to build discrete buckets from continuous numbers (something like $truncate), the aggregation framework wouldn't help in that case.

So, in a nutshell, I'd say the use cases are

keeping the results of M/R in a separate collection and updating it from time to time (using the out parameter and merging the results)
Complex queries on large sharded data sets
Queries that are so complex that you can't use the aggregation framework. I'd say that's a pretty certain sign of a design flaw in the data structure, but in principle, it can help

When to use map reduce over Aggregation Pipeline in MongoDB?

1 Answers1