1

I'm new to MongoDB and trying to make sure I set up the proper indexes. I've seen similar questions about composite indexes here but none that exactly cover the situation I'm in.

Note: I'm using Rails 3.2 and Mongoid.

I have a collection of Events that are always going to be sorted (and often searched on) date, but generally with another parameter as well. For example, I might want to find the Events that match a particular set of categories within a certain date range; or I might want to find the Events that match a particular person within a certain date range. The types of searches will be:

  1. Always by date (or at least sorting by date)
  2. Often by category
  3. Sometimes additionally by [person, venue, or keyword]

The first solution I came up with was multiple composite keys that all start with date and category, like so:

class Event
...

index ([
    [:date, Mongo::DESCENDING], 
    [:category_id, Mongo::ASCENDING]
    ["people.person_id", Mongo::ASCENDING]
  ])
index ([
    [:date, Mongo::DESCENDING], 
    [:category_id, Mongo::ASCENDING]
    [:venue_id, Mongo::ASCENDING]
  ])
index ([
    [:date, Mongo::DESCENDING], 
    [:category_id, Mongo::ASCENDING]
    [:keywords, Mongo::ASCENDING]
  ])

But it seems a little funny to me to keep overlapping the "date + category_id" index, and also what about the cases when I'm not searching on category_id?

UPDATE: dcrosta asked what kind of queries would be running, and how frequently. Without knowing exactly, I can guess that it would look something like the following:

Very frequent:

  • by date
  • by date + category
  • by date + keyword
  • by date + category + keyword

Somewhat frequent:

  • by date + person
  • by date + venue

Less frequent:

  • by date + category + venue
  • by date + category + person
kaptron
  • 477
  • 5
  • 12
  • Can you enumerate all the query types you will be running, like `db.event.find({date: {$gt: ..., $lt: ...}, category_id: ...})`, `db.event.find({date: {$gt: ..., $lt: ...}, "people.person_id": ...})`, and then characterize approximately how frequently they are run? With this information in hand it should be pretty simple to figure out which indexes will help the most. – dcrosta Mar 02 '12 at 22:56
  • Have you explain your queries to see if you really need index? – shingara Mar 03 '12 at 22:01

1 Answers1

3

OK, given those queries, here are the indexes I would create:

db.events.createIndex({date: 1, category: 1})
db.events.createIndex({date: 1, keyword: 1})

Either of these queries can be used for queries by date only, and either can be used for date + category + keyword. Which one is chosen in the last case will depend on the selectivity of the two fields and the particular query in question.

You may also want an index on date by itself, which will serve as a catch-all for the remaining queries. Whether or not this is going to help much depends on the volume of data and exactly what "somewhat frequent" means, exactly.

More generally speaking, and addressing your initial question, indexes in MongoDB, like any database, will increase the performance of queries (for those queries which they match), at the cost of slightly degrading the performance of updates/inserts/deletes (since the index must be modified along with the underlying data). My approach is to build indexes for those queries which I know will be either very costly or very frequent, and then test using a realistic distribution of load (i.e. a realistic number and frequency of queries and updates/inserts/deletes) to see what other queries are more costly than you expect. You can use the database profiler to assist in collecting this information, possibly with a tool like Professor (#shamelessplug) to assist in understanding the results.

dcrosta
  • 26,009
  • 8
  • 71
  • 83