3

According to this post compound indexes are bigger in terms of dimensions (I could not find much info on docs, so if you could point me there I would be grateful).

Suppose I have to search for the whole address (we can assume I will always have all the fields available both in collection and in the query) through a collection of addresses like

{
  name: String,
  street: String,
  postcode: String,
  City: String,
  Country: String
}

My question is: how bigger a compound index would be? If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?

Community
  • 1
  • 1
bncc
  • 478
  • 7
  • 18

1 Answers1

3

If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?

These accomplish different things. A compound index has an order and that order has an effect. For instance, the index { 'country' : 1, 'city' : 1, 'postcode' : 1 } would allow to search for all address in a specific city of a specific country. A hash can't do that - hashes only support exact matches.

I don't see how this is bad practice at all, it's just a very narrow use case. Remember than every slight difference in spelling, additional white spaces, etc. will result in different hash values and that you can't even answer simple question like "how many address in country X do we store?". But if you don't need that, why not?

By the way, MongoDB has built-in support for this. If the address is embedded, using a hashed index on the entire subdocument will accomplish what you need:

MongoDB supports hashed indexes of any single field. The hashing function collapses embedded documents and computes the hash for the entire value,

e.g.:

> db.hash.insert( {"name": "john", "address" : { "city" : "Chicago", "state":"IL",
                   "country" : "US" } } );
WriteResult({ "nInserted" : 1 })
> db.hash.createIndex( { "address" : "hashed" } );
...
>
> This query uses the index and finds the document:
> db.hash.find({"address" : {"city" : "Chicago", "state": "IL", "country" : "US" } } );
>
> // this query wont find the document b/c of missing state, but is still fast (IXSCAN)
> db.hash.find({"address" : {"city" : "Chicago", "country" : "US"  } } );
mnemosyn
  • 45,391
  • 6
  • 76
  • 82
  • 1
    Thank you very much for the answer which pretty much fits my need. I'll accept it but I would also like to see if someone gives more info or reference for what concerns space consumption – bncc Mar 12 '15 at 14:05