0

I am new to MR jobs in mongodb. I have a aggregate function looks like :

db.acollection.aggregate([{$match:{ "userId" : { "$eq" : "raghu" }}}, 
{$group:{ "_id" : { "region":"$region", "shipMode" : "$shipMode"}, "sales" : { "$sum" : "$sales"}}}, 
{"$sort" : { "_id.region" : 1, "sales" : 1 }}, { "$limit" : 1000}]);

Due to performance issues ref:MongoDB performing slow under load I am creating a MRjob. So I should get all the documents relevant in Map and all the documents in Reduce should be groupby, sort and limit I guess. I have a function like below :

final MongoDatabase mongoDatabase = MongoUtils.getMongoDatabase(model);
BasicDBObject obj = pipeline.get(1);
MapReduceIterable<Document> list = mongoDatabase.getCollection(collectionName).mapReduce(getMapFunction(obj.getString("userId")), getReduceFunction());
// the above code is the main call but am mostly thinking about the map and reduce functions.

private String getMapFunction(String whereCondition) {

StringBuilder map= new StringBuilder();
map.append("function() {"
        + "var key=whereCondition;"
        + "if(this.userId==key)"
// how to get all the documents for this key ?
        + "}");
}


    private String getReduceFunction() {

        String reduce="";
// what should go here ?
        return reduce;
    }

Not sure how I can arrive at JavaScript code, I want to emit the complete JSON object as value so that I can mapreduce it. Something like :

private String getMapFunction() {

//somecode here and then finally ..
  emit(this.tenantId_V, object);
}
halfer
  • 19,824
  • 17
  • 99
  • 186
Raghuveer
  • 2,859
  • 7
  • 34
  • 66
  • I doubt mapreduce is going to be faster than aggregation. It is much flexible than aggregation, but not faster. It more suits for ETL jobs rather than operational queries. – Alex Blex Oct 15 '18 at 17:49
  • But I think for large datasets MR is preferred. Also if I have to do a `group By` on multiple fields then will I have to run MR jobs multiple times ? especially based on my aggregation query. – Raghuveer Oct 16 '18 at 05:01
  • Is there anything in particular that makes you thinking MR is preferred? https://docs.mongodb.com/manual/core/map-reduce/ recommends opposite. MR runs a slow JS on both map and reduce functions and makes no use of indexes apart from in filters. Aggregation uses own compiled functions and uses optimisation algorithms behind the scene. In general it is considered much faster than MR. Take a glance at https://stackoverflow.com/questions/13908438/is-mongodb-aggregation-framework-faster-than-map-reduce or other similar questions – Alex Blex Oct 16 '18 at 08:25

0 Answers0