1

I have the following mongoose schema:

var dataSchema = new Schema({  
    owner: { type: Schema.ObjectId, ref: 'User' },  
    time : { type: Date, default: Date.now },  
    eventCount:Number  
});

For some dataObjects the eventCount is defined and a positive number, for other dataObjects the eventCount is undefined. I want to design an index such that queries like this are as fast as possible:

db.datacollection.find({owner: <some ID>, eventCount: {$exists:true}, time: {<some time range>})

What is the best way to do this?

Here is the best solution that I could think of but I'd be interested in knowing if anyone has a better solution:

Add an isEventCount boolean variable to the dataSchema. Set up mongoose middleware so that isEventCount is calculated before saving the object to db with the following logic.

if(eventCount > 0) {
    isEventCount = true;
} else {
    isEventCount = false;
}

then building an index like this

db.datacollection.ensureIndex({user:1, isEventCount: 1, time:1})

and running my query like this

db.datacollection.find({owner: <some ID>, isEventCount: true, time: {<some time range>})

There are a couple drawbacks to this approach. Namely:

  1. I am saving redundant information in the database.
  2. I have to write additional middleware code to achieve this.
  3. I have to modify existing entries in the db.

Is anyone aware of either a better solution or a library that can help with this?

Jacob Horbulyk
  • 2,366
  • 5
  • 22
  • 34

1 Answers1

0

You don't need to go thru all that to get decent performance out this query. Using your existing query:

db.datacollection.find({owner: <some ID>, eventCount: {$exists:true}, time: {<some time range>}

this index:

db.datacollection.ensureIndex({user:1, eventCount: 1, time:1})

in most cases should give you quite good performance. Obviously I don't know the size or distribution of your existing data set, but I think it would require something fairly unusual for you to need to create an isEventCount flag.

Generally speaking before taking any unusual measures with schema design in MongoDB I would do the following:

1) Set up test data with reasonable volumes 2) Try your queries using explain()

It will give you a pretty good idea of how well the query will perform and how/when it is using indexes.

http://docs.mongodb.org/manual/reference/method/cursor.explain/#cursor.explain

John Petrone
  • 26,943
  • 6
  • 63
  • 68
  • I tried this. This doesn't work. Imagine that I have the following data: {owner: UserA, dataCount: 1, time: yesterday} {owner: UserA, dataCount: 1, time: lastYear} - Imagine that there are 1000000 records like this {owner: UserA, dataCount: 2, time: yesterday} The index that you described above would order the records in the way that I described above. And then if I were to run the query: find({owner: UserA, eventCount:{$exists: true}, time: {since yesterday}}) the cursor would have to scan all the records from last year. – Jacob Horbulyk Jun 11 '14 at 15:38
  • "doesn't work" means the query fails? or you don't like the performance? you can drop the time field from the index - as I said I don't know the size and distribution of your data, only you do. the point is you don't need the calculated field "isEventCount". On any event you should be benchmarking the various approaches to see what works for you - no one schema/indexing approach will work for all people and all use cases. – John Petrone Jun 11 '14 at 15:54
  • Doesn't work as in I don't like the performance. I want to design my indexes in a way where for this particular query, I want `nscanned` to equal the number of objects found. With my suggested scheme that is possible while with your scheme that isn't possible. (This is the case for the following toy dataset: `{ "datacount" : 1, "time" : ISODate("2013-06-10T15:29:32.603Z"), "exists" : true } { "datacount" : 2, "time" : ISODate("2014-06-10T15:29:32.603Z"), "exists" : true } { "datacount" : 1, "time" : ISODate("2014-06-10T15:29:32.603Z"), "exists" : true }` ) – Jacob Horbulyk Jun 12 '14 at 08:54