12

While looking at documentation for map-reduce, I found that:

NOTE:

For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.

I did not understand much from it.

  • What are the use cases for using map-reduce over aggregation pipeline?
  • What flexibility does map-reduce provide?
  • How much delta is there in performance?
John Saunders
  • 160,644
  • 26
  • 247
  • 397
Dev
  • 13,492
  • 19
  • 81
  • 174

1 Answers1

19

For one thing, Map/Reduce in MongoDB wasn't made for ad-hoc queries, there's considerable overhead to M/R. Even a very simple M/R operation on a small dataset can take in the hundreds of milliseconds because of that overhead.

I can't say much about the performance of M/R compared to the aggregation framework on large datasets in practice, but in theory, M/R operations on a large sharded database should be faster since the shards can run the operations largely in parallel.

As to the flexibility, since M/R actually runs javascript methods you have the full power of the language at your disposal. For example, let's say you wanted to group some data by the cosine of a field's value. Since there's neither a $cos operator in the aggregation framework, nor a meaningful way to build discrete buckets from continuous numbers (something like $truncate), the aggregation framework wouldn't help in that case.

So, in a nutshell, I'd say the use cases are

  • keeping the results of M/R in a separate collection and updating it from time to time (using the out parameter and merging the results)
  • Complex queries on large sharded data sets
  • Queries that are so complex that you can't use the aggregation framework. I'd say that's a pretty certain sign of a design flaw in the data structure, but in principle, it can help
John Saunders
  • 160,644
  • 26
  • 247
  • 397
mnemosyn
  • 45,391
  • 6
  • 76
  • 82