20

So i'm new with mongodb and mapreduce in general and came across this "quirk" (or atleast in my mind a quirk)

Say I have objects in my collection like so:

{'key':5, 'value':5}

{'key':5, 'value':4}

{'key':5, 'value':1}

{'key':4, 'value':6}

{'key':4, 'value':4}

{'key':3, 'value':0}

My map function simply emits the key and the value

My reduce function simply adds the values AND before returning them adds 1 (I did this to check to see if the reduce function is even called)

My results follow:

{'_id': 3, 'value': 0}

{'_id':4, 'value': 11.0}

{'_id':5, 'value': 11.0}

As you can see, for the keys 4 & 5 I get the expected answer of 11 BUT for the key 3 (with only one entry in the collection with that key) I get the unexpected 0!

Is this natural behavior of mapreduce in general? For MongoDB? For pymongo (which I am using)?

Community
  • 1
  • 1
IamAlexAlright
  • 1,500
  • 1
  • 16
  • 29

5 Answers5

39

The reduce function combines documents with the same key into one document. If the map function emits a single document for a particular key (as is the case with key 3), the reduce function will not be called.

Jenna
  • 2,386
  • 17
  • 10
  • 10
    Just to be clear, this is the way map reduce was designed. If you would like to modify documents with unique keys (like key 3), consider using the finalize function: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-FinalizeFunction – Jenna Jun 13 '12 at 19:39
  • 3
    what's the solution if we want to include key with single document in results ??? – Ravi Khakhkhar Feb 25 '13 at 10:51
  • 1
    @RaviKhakhkhar the single documents are still included in the results, just the reduce functions is never called on them – Cilvic Jan 10 '14 at 18:39
  • Thanks! You saved my day :) – pankajt May 07 '16 at 15:05
  • @Jenna, you said this is the way map reduce was designed. Do you have any reference to explain the design philosophy of such? – palazzo train Mar 16 '18 at 07:33
5

I realize this is an older question, but I came to it and felt like I still didn't understand why this behavior exists and how to build map/reduce functionality so it's a non-issue.

The reason MongoDB doesn't call the reduce function if there is a single instance of a key is because it isn't necessary (I hope this will make more sense in a moment). The following are requirements for reduce functions:

  • The reduce function must return an object whose type must be identical to the type of the value emitted by the map function.
  • The order of the elements in the valuesArray should not affect the output of the reduce function
  • The reduce function must be idempotent.

The first requirement is very important and it seems a number of people are overlooking it because I've seen a number of people mapping in the reduce function then dealing with the single-key case in the finalize function. This is the wrong way to address the issue, however.

Think about it like this: If there's only a single instance of a key, a simple optimization is to skip the reducer entirely (there's nothing to reduce). Single-key values are still included in the output, but the intent of the reducer is to build an aggregate result of the multi-key documents in your collection. If the mapper and reducer are outputting the same type, you should be blissfully unaware by looking at the object structure of the output from your map/reduce functions. You shouldn't have to use a finalize function to correct the structure of your objects that didn't run through the reducer.

In short, do your mapping in your map function and reduce multi-key values into a single, aggregate result in your reduce functions.

senfo
  • 28,488
  • 15
  • 76
  • 106
3

Solution:

  • added new field in map: single: 0
  • in reduce change this field to: single: 1
  • in finalize make checking for this field and make required actions

    $map = new MongoCode("function() {
        var value = {
            time: this.time,
            email_id: this.email_id,
            single: 0
        };
    
        emit(this.email, value);
    }");
    
    $reduce = new MongoCode("function(k, vals) {
    
        // make some need actions here
        return {
            time: vals[0].time,
            email_id: vals[0].email_id,
            single: 1
        };
    }");
    
    $finalize = new MongoCode("function(key, reducedVal) {
        if (reducedVal.single == 0) {
            reducedVal.time = 11111;
        }
        return reducedVal;
    };");
    
1

"MongoDB will not call the reduce function for a key that has only a single value. The values argument is an array whose elements are the value objects that are “mapped” to the key."

http://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-reduce-cmd

ftaher
  • 148
  • 5
0

Is this natural behavior of mapreduce in general?

Yes.

Marco Sero
  • 460
  • 6
  • 18
  • 9
    No - this is not natural for "MR in general". Neither the original MR paper, nor Hadoop Map Reduce do this. You might want to convert that "1" to another type in the reducer, right? So in general skiping the reducer would be a pretty bad/weird idea ;-) This doesn't mean that mongo's MR doesn't do it - but it's not "the expected behavior _in general_". – Konrad 'ktoso' Malawski Jan 31 '13 at 16:27
  • Here http://docs.mongodb.org/manual/tutorial/troubleshoot-reduce-function/ Mongo says that retuce has to return value of the same type map does. However, I agree it's bad, quirk, unexpected and unclear. – amorfis Jul 09 '13 at 18:38