I am trying to create a Mapreduce function, which will exclude duplicates from a collection. This is an assignment and I'm new to MongoDB, so I apologize if my code is not very "pretty"; also, for all that matters, I'm using MongoVUE.
I have a collection called cities and each document has, amongst others, a CountryID and a Name field. The first part of the assignment consists in writing a MapReduce function that returns all the city-names matching a given country, keeping the duplicates and counting the amount of cities.
I solved this with the following setup:
db.runCommand({ mapreduce: "cities",
map : function Map() {
emit(
this.CountryID,
{ "citiesList" : [this.Name], "count" : 1 }
);
},
reduce : function Reduce(key, values) {
var reduced = {"citiesList" : [], "count" : 0};
values.forEach(function(val) {
reduced.citiesList.push(val);
reduced.count += val.count;
});
return reduced;
},
finalize : function Finalize(key, reduced) {
return reduced;
},
query : { "CountryID" : 15 },
out : { inline : 1 }
});
Now I should improve my answer in order to exclude the duplicates, counting the amount of documents in the new collection. I managed to get this information through the console with db.cities.distinct("City", {"CountryID" : 15});
(not supported in MongoVUE afaik), but I can't seem to get a solution with MapReduce (please note that I must use MapReduce, not aggregate).
My idea: add an if condition in my reduce function, in order to only push values that are not already in my list. That would be something like
values.forEach(function(val) {
if(!reduced.citiesList.contains(val)) { // val not contained
reduced.citiesList.push(val);
reduced.count += val.count;
}
});
This won't work, I tried to use the $in and $exists operators but I obviously didn't get that right, and MongoVUE isn't really helping (I don't get any error message?!).
Alternatively, I thought about iterating through my list in the finalize function and removing the duplicates, but I also couldn't find a way of doing that (NOTE: I want to exclude them from my output, not delete them from the collection).
What I'd like to know is:
a) am I on the right track here or did I get it all wrong? the assignment was pretty easy so far and I might be overlooking a simple solution
b) any hint on how to modify my existing solution to make it work?