Retrieving data from mongo takes too long, even for small datasets. For bigger datasets we get out of memory errors of the javascript engine. We've tried several schema designs and several ways to retrieve data. How do we optimize mongoDB/mapReduce function/MongoWire to retrieve more data quicker? We're not very experienced with MongoDB yet and are therefore not sure whether we're missing optimization steps or if we're just using the wrong tools.
1. Background
For graphing and playback purposes we want to store changes for several objects over time. Currently we have tens of objects per project, but expectations are we need to store thousands of objects. The objects may change every second or not change for long periods of time. A Delphi backend writes to and reads from MongoDB through MongoWire and SuperObjects, the data is displayed in a web frontend.
2. Schema design
We're storing the object changes in minute-second-millisecond objects in a record per hour. The schema design is like described here. Sample:
o: object1,
dt: $date,
v: {0: {0:{0: {speed: 8, rate: 0.8}}}, 1: {0:{0: {speed: 9}}}, …}
We've put indexes on {dt: -1, o: 1}
and {o:1}
.
3. Retrieving data
We use a mapReduce to construct a new date based on the minute-second-millisecond objects and to put the object back in v:
o: object1,
dt: $date,
v: {speed: 8, rate:0.8}
An average document is about 525 kB before the mapReduce function and has had ~29000 updates. After mapReduce of such a document, the result is about 746 kB.
3.1 Retrieving data from through mongo shell with mapReduce
We're using the following map function:
function mapF(){
for (var i = 0; i < 3600; i++){
var imin = Math.floor(i / 60);
var isec = (i % 60);
var min = ''+imin;
var sec = ''+isec;
if (this.v.hasOwnProperty(min) && this.v[min].hasOwnProperty(sec)) {
for (var ms in this.v[min][sec]) {
if (imin !== 0 && isec !== 0 && ms !== '0' && this.v[min][sec].hasOwnProperty(ms)) {// is our keyframe
var currentV = this.v[min][sec][ms];
//newT is new date computed by the min, sec, ms above
if (toDate > newT && newT > fromDate) {
if (fields && fields.length > 0) {
for (var p = 0, length = fields.length; p < length; p++){
//check if field is present and put it in newV
}
if (newV) {
emit(this.o, {vs: [{o: this.o, dt: newT, v: newV}]});
}
} else {
emit(this.o, {vs: [{o: this.o, dt: newT, v: currentV}]});
}
}
}
}
}
}
};
The reduce function basically just passes the data on. The call to mapReduce:
db.collection.mapReduce( mapF,reduceF,
{out: {inline: 1},
query: {o: {$in: objectNames]}, dt: {$gte: keyframeFromDate, $lt: keyframeToDate}},
sort: {dt: 1},
scope: {toDate: toDateWithinKeyframe, fromDate: fromDateWithinKeyframe, fields: []},
jsMode: true});
Retrieving 2 objects over 1 hour: 2,4 seconds.
Retrieving 2 objects over 5 hour: 8,3 seconds.
For this method we would have to write js and bat files runtime and read the json data back in. We have not measured times fort his yet, because frankly, we don’t like the idea very much.
Another problem with this method is that we get out of memory errors of the v8 javascript engine when we try to retrieve data for longer periods and/or more objects. Using a pc with more RAM works to some extend in preventing out of memory, but it doesn't make retrieving data faster.
This article mentions splitVector, which we might use to devide the workload. But we're not sure on how to use the keyPattern and maxChunkSizeBytes options. Can we use a keyPattern for both o
and dt
?
We might use multiple collections, but our dataset isn’t that big to start with at the moment, so we’re worried about how much collections we’d need.
3.2 Retrieving data through mongoWire with mapReduce
For retrieving data through mongoWire with mapReduce, we use the same mapReduce functions as above. We use the following Delphi code to start te query:
FMongoWire.Get('$cmd',BSON([
'mapreduce', ‘collection’,
'map', bsonJavaScriptCodePrefix + FMapVCRFunction.Text,
'reduce', bsonJavaScriptCodePrefix + FReduceVCRFunction.Text,
'out', BSON(['inline', 1]),
'query', mapquery,
'sort', BSON(['dt', -1]),
'scope', scope
]));
Retrieving data with this method is about 3-4 times (!) slower. And then the data has to be translated from BSON (IBSONDocument to JSON (SuperObject), which is a major time consuming part in this method. For retrieving raw data we use TMongoWireQuery which translates the BSONdocument in parts, while this mapReduce function uses TMongoWire directly and tries to translate the complete result. This might explain why this takes so long, while normally it's quite fast. If we can reduce the time it takes for the mapReduce to return results, this might be a next step for us to focus on.
3.3 Retrieving raw data and parsing in Delphi
Retrieving raw data to Delphi takes a bit longer then the previous method, but probably because of the use of TMongoWireQuery, the translation from BSON to JSON is much quicker.
4. Questions
- Can we do further optimizations on our schema design?
- How can we make the mapReduce function faster?
- How can we prevent the out of memory errors of the v8 engine? Can someone give more information on the splitVector function?
- How can we best use of mapReduce from Delphi? Can we use MongoWireQuery in stead of MongoWire?
5. Specs
MongoDB 3.0.3
MongoWire from 2015 (recently updated)
Delphi 2010 (got XE5 as well)
4GB RAM (tried on 8GB RAM as well, less out of memory, but reading times are about the same)