2

Indexes optimization:

We reviewed all our indexes on the expensive collection [170M documents] .

We started to removed most on the indexes ; There are 2 main indexes remained [not counting the primary Key]. This trimmed the total index size to 1/3 of its prior value.

In high level we went with the strategy of using “compound indexes”; based on common ground of all properties within the documents used in the queries on the collection.

There are two main use cases used querying this collection.

Online
• There are two dedicated indexes for these two queries i.e. InfoId_1 (equals) and Source_1_StatusId_1_SoundExWord12_1 (equals, equals, in)

Offline – we are using [M-R] here. • There is a dedicated compound index for all query used in this use case. All the queries are using most of the fields set in the index. E.g. index name Source_1_StatusId_1_InfoIdHash_1_InfoUpdateDate_1_SoundExWord12_1 (equals, equals, range, range, for sort)

Results so far:

After examining the execution plan in MongoDB its seems that the dedicated index for offline operations is rejected in production for the dedicated index for online operations, causing a 270M result scan instead of a 130K result scan (in QA the correct index was selected in most cases). For some reason is doesn’t use the right index it is unclear why.

More research/tests we’ve done so far: Once we removed the online index the offline process ; M-R offline process started to work with good performances, thus we are sure it all a matter applying the rights indexes here.

Question/considerations:

There are two alternatives here: here I need you help in either achieving the 1st alertnative or optimizing the 2nd one:

  1. Adding a hint to the map reduce query (/query+sort) (via C# 2.0 legacy driver) to get MongoDB to select the right index – can’t find any way to do this. Once using the command db.infosoundexpair.find({$and: [ { "Source": { $eq: "XXX" } }, { "StatusId": { $eq: 0 } }, { "InfoIdHash": { $gte: -2147483648 } }, {"InfoIdHash": { $lte: -2143862259 } }, { "InfoUpdateDate": { $lt: ISODate("2015-06-04T00:00:00.000Z") } } ]}).sort({"SoundExWord12": 1}).hint("Source_1_StatusId_1_InfoIdHash_1_InfoUpdateDate_1_SoundExWord12_1").explain("executionStats"); the correct index (defined in the hint) was used.

  2. Recreating the dedicate online index with a different order of the fields in both the index and the online query prevents the offline process from using this index, enables the online process to use this index, however, does decrease the speed of the online process. We believe that the decrease in speed is since the field we moved from the end of the index to the beginning of the index (the one not used in the offline query, just in the offline sort) is used with contains ($in) instead of equals ($eq) with ~25 potential values, whereas the other two fields are equality comparisons.

Sample document:

{
    "_id" : "766574b0-0f3d-4e0a-b979-a6024bf6fbbb/cemcdeomc/vodemer",
    "InfoId" : LUUID("b0746576-3d0f-0a4e-b979-a6024bf6fbbb"),
    "Source" : "OFAC",
    "StatusId" : 0,
    "InfoUpdateDate" : ISODate("2015-06-09T13:55:25.863Z"),
    "SoundExWord12" : "cemcdeomc/vodemer",
    "InfoIdHash" : -547866111
}
Danny Varod
  • 17,324
  • 5
  • 69
  • 111
ron
  • 625
  • 2
  • 6
  • 17
  • Can you provide a sample document, the `.getIndexes()`, and your queries? This would make it much easier to reproduce the issue. And what's your MongoDB version? – xeraa Jun 09 '15 at 09:15

0 Answers0