5

Say I have a mongo collection which has a fix number of entries, which would never exceed a count of 300-400. Example:

User{
String name;
String phoneNumber;
String address;
String dob;
Integer noOfCars;
}

Of these fields, I would like to index name and phoneNumber.

Is creating an index for such small collections advisable? Does that decision depend at all on the size of collection? Does it depend on the number of indexes I want to create?

tunetopj
  • 561
  • 2
  • 5
  • 15
  • When we refrain from creating extra indexes on a field we would not regularly query, we are making this choice as the cost of index creation overpowers the benefits in provides. On similar lines, I am trying to ask if it makes sense to pay the cost of index creation for a small static collection. – tunetopj Feb 15 '15 at 17:28

3 Answers3

6

It doesn't matter. I just tried this on a sample collection that has 384 entries. According to explain(), the index scan took 0ms, while the first collection scan took 2ms - every following collection scan took 0ms, too.

Does that decision depend at all on the size of collection?

Yes, the idea of an index is that it adds cost for creating and updating data which is amortized by making queries faster. In particular, a simple list has an asymptotic insert performance of O(1) and a search time of O(N), while a B-Tree has O(log n) for both, i.e. we accept slower inserts because we assume we read more often than we write, or the data is so large that even a few O(N) reads would be impacting performance, i.e. if N >> log N.

At only a few hundred elements, all this doesn't matter much because the difference between log n and n is small, and because the more complex algorithm's runtime overhead (i.e., the constant factor that is hidden through the Landau-Notation because it's largely implementation-dependent) plays in the same league. The same applies to your code: it doesn't make sense to put 200 elements in a hashtable, a list iteration might even be faster because it avoids branching.

If the documents are huge however, the collection scan will have to wrangle more data (instead of just looking at the index).

mnemosyn
  • 45,391
  • 6
  • 76
  • 82
  • This answer is so packed with info that it opens up a bunch of derivative questions. That's a great answer. – Manuel Feb 19 '20 at 22:39
3

Is creating an index for such small collections advisable?

This might be an opinion, being that the collection is so small and the DB might have optimizations for such small collections. My opinion would be to do it but there are pros and cons.

con: Increased system complexity. This is akin to the more LOC's you have the more bugs you may have.

pro: Would future proof the collection should the usage increase or the collection size increase.

Does that decision depend at all on the size of collection?

Yes it does. And barring any DB optimizations that may occur on such small collection, it also depends on usage.

Does it depend on the number of indexes I want to create?

More indexes increase the write time, but this would need to be tested for your particular setup. Nothing beats real tests since there are many factors at play. I know that in previous projects we have used TokuMX for MongoDB and have seen amazing write perofrmance... 2 minutes with Toko vs 12 minutes for regular mongo on writing 500k entries with 19 indexes.

Jose Martinez
  • 11,452
  • 7
  • 53
  • 68
0

I think you should. Persistence storage is hardly an issue. Also index of small collection is also small. It also depends on the volume of queries. If there is high volume of queries, then even slight improvement to individual queries will aggregate to huge performance improvement.

Ishan Jain
  • 262
  • 2
  • 14