1

Is there any thing similar to Combiner (as in Hadoop Map Reduce) in Mongo DB's map reduce framework. We're trying out the map reduce framework in Mongo DB cluster, and there are several rows for a key which could potentially be combined before being sent for reduce phase.

1 billion records which I'm going to map-reduce
Each record is 100 bytes
100 records with same key (map) on each node

Wouldn't network b/w be a bottle neck for such operation? I understand this would result in lots of emits, which could be avoided if there's a mini reducer (say combiner) phase on each node, or is my understanding incorrect?

Is there any thing close to Combiner phase of Hadoop Map reduce, or if not, is there anything similar planned for later releases?

greedybuddha
  • 7,488
  • 3
  • 36
  • 50
Ouroboros
  • 1,432
  • 1
  • 19
  • 41
  • What network are you talking about? Mapreduce in MongoDB runs on the server - there is no network involved. Or are you talking about sharded cluster? Please clarify. Reduce *is* the phase that combines results. – Asya Kamsky May 15 '13 at 12:02
  • @AsyaKamsky *Combine* is an intermediate step between *Map* and *Reduce*, see: http://wiki.apache.org/hadoop/HadoopMapReduce – adrianp May 15 '13 at 13:04
  • @AsyaKamsky: I've specified we're using cluster. – Ouroboros May 15 '13 at 13:17

1 Answers1

1

As to my current knowledge, there is no combiner phase in the MapReduce implementation of Mongo. Mongo implements a somewhat different version of MR than the standard; if you have performance issues, you are better using Hadoop.

Here you have another SO question discussing the differences between Mongo MR and Hadoop.

Community
  • 1
  • 1
adrianp
  • 2,491
  • 5
  • 26
  • 44