Mongo DB scaling question ( does indexes affect the 'distinct' performance )?

Question

I'm using Mongo to store, day by day, all the "ticks" of a set of about 40 equity. These ticks contains the trade info ( a document containing price and volume ) and book info ( a more complex document containing sell-buy proposal ). The magnitude order is about 5K trades+20K books *40 equity per day. Document are indexed both per Symbol ( the equity name ) date of insert, timeof day. After a week of collection one of my query does not scale anymore: looking for distinct date takes to long. So i decided to have a special document just to say that there is a "collection" for a certain day, is this a correct approach ? Furthermore, is correct to collect things as a separate little document, or would be better to collect ticks as an array on the equity document ?

Thanks all !

BTW this question is a consequence of this one: Using mongodb for store intraday equity data

Addition: even if I explicitly say ( at the console )

db.books.ensureIndex({dateTag:1})
db.books.distinct("dateTag")

it reply slowly. So maybe a better question is: does index affect the distinct performance ?

Addition After upgrading to 1.8.2 behavior is the same.

score 2 · Accepted Answer · answered May 27 '11 at 17:19

does index affect the distinct performance ?

It does indeed, however there's no "explain plan" so this can only be confirmed via the docs / code.

Document are indexed both per Symbol ( the equity name ) date of insert, timeof day

I'm not 100% clear on how many indexes you have or what type of memory footprint you have here. Just having an index does not necessarily mean that it's going to be really fast. If that index is not in memory, then you end up going to disk and slowing down your query.

If you're seeing slow performance on this query despite the index I would check two things:

Disk activity (during the query)
Data size relative to memory

However, it may just be easier to keep a list of "days stored". That distinct query is probably going to get worse, even with an index. So it's never going to be as fast as a document simply listing the days.

eventually I used the days stored document. The DB activity is anyway high since I'm doing the query while new data comes in. Anyway, as you guess having the day stored solve the proble. — Felice Pollano, May 28 '11 at 07:54

score 1 · Answer 2 · answered May 26 '11 at 07:55

1

I don't think that your "collection for a certain day" approach would work out because you would run into MongoDb's limit of 24,000 namespaces per database. Storing the ticks in an array property of a document could make it harder to execute certain types of query (really depends on what types of reports you need to run on the ticks).

Are you sure that you have indexes in place for the properties you use in your problematic query? As last resort you could try sharding but I doubt that that is necessary at this point.

answered May 26 '11 at 07:55

Oliver Weichhold

10,259
5
45
87

the collection are actually just two: trades and books. they both contains a lot of document. Should I worry about namespaces a s well ? The query that does not scale anymore was one selecting the distinct on the book collection for the field date, even if it is indexed. – Felice Pollano May 26 '11 at 08:01
You would only need to worry about the namespace limit if you were to use a separate collection for every day of trade data. Did you check if that index is actually being used for that query? If you don't know how I suggest reading http://www.mongodb.org/display/DOCS/Optimization#Optimization-Explain. – Oliver Weichhold May 26 '11 at 08:07

score 0 · Answer 3 · answered May 26 '11 at 10:25

0

http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

clearly states that distinct() can uses indexes starting MongoDB 1.7.3

answered May 26 '11 at 10:25

well with 1.8.2 and one week of data query takes 3 sec. It i sbetter than 1.6.x, but it does not seems to me the way to go... – Felice Pollano May 26 '11 at 10:54

Mongo DB scaling question ( does indexes affect the 'distinct' performance )?

3 Answers3