1

I am trying to create a Mapreduce function, which will exclude duplicates from a collection. This is an assignment and I'm new to MongoDB, so I apologize if my code is not very "pretty"; also, for all that matters, I'm using MongoVUE.

I have a collection called cities and each document has, amongst others, a CountryID and a Name field. The first part of the assignment consists in writing a MapReduce function that returns all the city-names matching a given country, keeping the duplicates and counting the amount of cities.

I solved this with the following setup:

db.runCommand({ mapreduce: "cities", 
 map : function Map() {

    emit(
        this.CountryID,
        { "citiesList" : [this.Name], "count" : 1 }
    );
},
 reduce : function Reduce(key, values) {

    var reduced = {"citiesList" : [], "count" : 0};

    values.forEach(function(val) {
        reduced.citiesList.push(val);
        reduced.count += val.count;
    });

    return reduced; 
},
 finalize : function Finalize(key, reduced) {

    return reduced;
},
 query : { "CountryID" : 15 },
 out : { inline : 1 }
 });

Now I should improve my answer in order to exclude the duplicates, counting the amount of documents in the new collection. I managed to get this information through the console with db.cities.distinct("City", {"CountryID" : 15}); (not supported in MongoVUE afaik), but I can't seem to get a solution with MapReduce (please note that I must use MapReduce, not aggregate).

My idea: add an if condition in my reduce function, in order to only push values that are not already in my list. That would be something like

    values.forEach(function(val) {
    if(!reduced.citiesList.contains(val)) { // val not contained
        reduced.citiesList.push(val);
        reduced.count += val.count;
    }
});

This won't work, I tried to use the $in and $exists operators but I obviously didn't get that right, and MongoVUE isn't really helping (I don't get any error message?!).

Alternatively, I thought about iterating through my list in the finalize function and removing the duplicates, but I also couldn't find a way of doing that (NOTE: I want to exclude them from my output, not delete them from the collection).

What I'd like to know is:

a) am I on the right track here or did I get it all wrong? the assignment was pretty easy so far and I might be overlooking a simple solution

b) any hint on how to modify my existing solution to make it work?

JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
PLB
  • 881
  • 7
  • 20

0 Answers0