MongoDB index on many (nested) fields/attributes

Question

In e-commerce application I have documents like this:

{ category:'A', ..., price:122,
  attr:{ width:6, height:4, hasLCD:true, lcdType:'some text', ..., a36:null }
}

I.e. every product has many attributes of various simple types.

Now I want to filter products by dynamic queries containing top level fields plus some attributes. For example:

find({category:'A', price:{$lt:200}, ...,
     'attr.height':{$lt:6}, 'attr.hasLCD':true, 'attr.lcdType':{$in:[...]}, ...})

And I'd like this to perform fast.

Trying to index on all possible 'attr.*' variants gives me an error (too many compound keys). I also suspect that if I index it that way and then omit one of attrs in query index won't work.

Trying to index on 'attr' as a whole does not help either.

What is the proper way to model this under MongoDB?

Update

I have tried this approach (also mentioned here). I.e. store attributes as array of key-value pairs:

attr2: [ {tag:'lcgType', value:'some text'}, ...

And index it like this:

ensureIndex({ 'attr2.tag':1, 'attr2.value':1 })

And query like this:

find({attr2:{$all:[
  {$elemMatch:{tag:'bestseller',value:true}},
  {$elemMatch:{tag:'weight',value:{$lte:100}}}
]}})

Now explain() says that it is using "BtreeCursor attr2.tag_1_attr2.value_1" but still "nscanned" : 31607 and the whole execution time have actually increased (compared to non-indexed scenario).

Something is wrong here.

Sub-question

What if I select some (less than 31) most frequently queried attributes and try to index on those. If I put all of them in single compound index:

ensureIndex({'attr.a1':1, 'attr.a2':1, ...})

According to the docs this index won't be used for queries missing attr.a1 attribute.

How to define index in this case?

Do you have more than [31 fields](http://docs.mongodb.org/manual/reference/limits/#Number%20of%20Indexed%20Fields%20in%20a%20Compound%20Index) ? — Yann Moisan, Oct 30 '13 at 20:10
Creating a compound index for both category and price should be sufficient. Especially if you can fit your collection into RAM. Indexing each and every attribute will blow your disk space and eat up all your RAM. There is probably a reasonable limited amount of your documents (Cell phones? Cameras?). How many can you except per category and price range? few dozens probably. Even if you have to hit the disk that is not so bad — Ori Dar, Oct 30 '13 at 21:05
When querying at shop's top level, i.e. all categories, with some attributes it has to scan all 32 thousand products. It takes more than a second on an idle server. This is not too bad but not too good either. — Yaroslav Stavnichiy, Oct 30 '13 at 21:30
OK. But if you can restrict your query by category and price (range), you can avoid full scan — Ori Dar, Oct 30 '13 at 23:13
Unfortunately it is not me who decides on which category set to filter. It is end user. And users tend to filter the whole set. They want to find all products with particular lcdType regardless of brand (which is category in my case). — Yaroslav Stavnichiy, Oct 31 '13 at 08:43
although the post is a year old: This article adds some useful information to it: http://askasya.com/post/dynamicattributes (it seems the situation has been changed since mongoDB 2.4): " In 2.6 the order of expressions inside $all does not make a difference as the one that's most selective will be the one used by the query optimizer." — Matthias, Jan 07 '15 at 16:52

score 2 · Answer 1 · edited May 23 '17 at 11:57

If you really have to allow a lot of filters, combinations and possibly even sorts, MongoDB is not a good fit because it uses only one index per query. The number of indexes then grows way too fast, because compound keys are somewhat inflexible (that should answer the subquestion) and becomes a performance hog.

Use a search database like ElasticSearch, SolR, etc. instead that comes with the features you need. You can the use a $in on the ids that the search server returned if you want to keep the base information in MongoDB (it's usually a good idea to have the search database simply replicate the information of the primary data store so you don't need to sync changes two-way, which would be a nightmare)

MongoDB index on many (nested) fields/attributes

1 Answers1