4

I have some problems with mapreduce.

I want to group, sort and count some values in collection. I have collection such as:

----------------------------
| item_id    |    date      |
----------------------------
| 1          | 01/15/2012   | 
----------------------------
| 2          | 01/01/2012   |
---------------------------- 
| 1          | 01/15/2012   |
----------------------------  
| 1          | 01/01/2012   |
----------------------------
| 2          | 01/03/2012   |
----------------------------
| 2          | 01/03/2012   |
----------------------------
| 1          | 01/01/2012   |
----------------------------
| 1          | 01/01/2012   |
----------------------------
| 2          | 01/01/2012   |
----------------------------
| 2          | 01/01/2012   |
----------------------------

I want to group by item_id and count date by day for each item and sort date for each item and get result such as:

value: {{item_id:1, date:{01/01/2012:3, 01/15/2012:2 }},{item_id:2, date:{01/01/2012:3, 01/03/2012:2 }}}

I use mapReduce:

m=function()
{
   emit(this.item_id, this.date);
}
r=function(key, values)
{
var res={};
values.forEach(function(v)
{
if(typeof res[v]!='undefined') ? res[v]+=1 : res[v]=1;
});
return res;
}

But I didn't receive result such as:

{{item_id:1, date:{01/01/2012:3, 01/15/2012:2 }},{item_id:2, date:{01/01/2012:3, 01/03/2012:2 }}}

Any ideas?

ekad
  • 14,436
  • 26
  • 44
  • 46

1 Answers1

2

Given input documents of the form:

> db.dates.findOne()
{ "_id" : 1, "item_id" : 1, "date" : "1/15/2012" }
> 

The following map and reduce functions should produce the output that you are looking for:

var map = function(){
    myDate = this.date;
    var value = {"item_id":this.item_id, "date":{}};
    value.date[myDate] = 1;
    emit(this.item_id, value);
}

var reduce = function(key, values){
    output = {"item_id":key, "date":{}};
    for(v in values){
        for(thisDate in values[v].date){
            if(output.date[thisDate] == null){
                output.date[thisDate] = 1;
            }else{
                output.date[thisDate] += values[v].date[thisDate];
            }
        }
    }
    return output;
}

> db.runCommand({"mapReduce":"dates", map:map, reduce:reduce, out:{replace:"dates_output"}})

> db.dates_output.find()
{ "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/15/2012" : 2, "1/01/2012" : 3 } } }
{ "_id" : 2, "value" : { "item_id" : 2, "date" : { "1/01/2012" : 3, "1/03/2012" : 2 } } }

Hopefully the above will do what you need it to, or at least get you pointed in the right direction.

For more information on using Map Reduce with MongoDB, please see the Mongo Documentation: http://www.mongodb.org/display/DOCS/MapReduce

There are some additional Map Reduce examples in the MongoDB Cookbook: http://cookbook.mongodb.org/

For a step-by-step walkthrough of how a Map Reduce operation is run, please see the "Extras" section of the MongoDB Cookbook recipe "Finding Max And Min Values with Versioned Documents" http://cookbook.mongodb.org/patterns/finding_max_and_min/

Good luck!

Marc
  • 5,488
  • 29
  • 18
  • Marc, thanks for your help. After this I have some problem with sort date. I want such as: { "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/01/2012" : 3, "1/15/2012" : 2 } } }. I added sort:{date:1} into db.runCommand({"mapReduce":"dates", map:map, reduce:reduce,sort:{date:1}, out:{replace:"dates_output"}}). But after this operation I have: { "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/01/2012" : 1, "1/15/2012" : 1 } } } Date is sort but cout always 1 – Sergey Kleimenov Apr 03 '12 at 14:51
  • Hello. I am happy that I was able to assist! The embedded elements inside of "value" are added in whatever order they are discovered. Sorting the input on the date key would put the dates in order in the embedded document. However, in this example, the dates are strings, and there is no guarantee that a string with a later date has a greater value than a string of an earlier value. For example (in the js shell): > "10/01/2012" < "1/01/2013" false The strings will have to first be converted to dates in order to be properly compared. – Marc Apr 04 '12 at 16:12
  • Once you have taken care of this, you should be able to get the results in the order that you desire: > db.runCommand({"mapReduce":"dates", map:map, reduce:reduce, sort:{"date":1}, out:{replace:"dates_output"}}) If you receive an error similar to "exception: could not create cursor over test.dates for query", try adding an index to the "date" key in the input collection. – Marc Apr 04 '12 at 16:12
  • Hi. Marc can you help with some problem. I have a collection, that contains fields: id_1(objectID),id_2(objectID) and date(Date). When I try to use your mapreduce with id_1 i have nice result : { "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/01/2012" : 3, "1/15/2012" : 2 } } }, but when I use id_2 : { "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/01/2012" : 1, "1/15/2012" : 1 } } } - count of dates always 1. This two fields haven't difference and I don't use any sorting. Where I could be mistaken? – Sergey Kleimenov Apr 11 '12 at 16:05
  • I have this problem if in collection only two different records ID. And my result such as: id_2 : { "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/01/2012" : 1, "1/15/2012" : 1 } } } , BUT if I'll add one record yet, my result - "_id" : 1, "value" : { "item_id" : 1, "date" : { "1/01/2012" : 3, "1/15/2012" : 2 } } }. I don't understand why don't work if records for "group by"< 3 – Sergey Kleimenov Apr 12 '12 at 11:23
  • I am not sure I understand your question. Are you changing the format of the input documents? If so, you will have to modify the map and reduce functions to incorporate the new fields that you would like to sort by, "id_1" and "id_2". I am also afraid that I don't understand what you mean by "don't work if records for 'group by'< 3". If your input collection is in the same format outlined in the example { "_id" : 1, "item_id" : 1, "date" : "1/15/2012" }, the map/reduce operation should work with any sized collection. – Marc Apr 12 '12 at 17:03
  • Unfortunately, the comments section is difficult to provide help in because of the formatting and length limitations. If you are still having an issue, can you please create a new question with sample documents and exactly the steps that you are taking so I may attempt to reproduce? If you do create a new issue, please place a link to it here, so that I, or any other users who may be experiencing a similar issue may find it easily. Thanks. – Marc Apr 12 '12 at 17:03