2

Documents in the collection follow the schema as given below -

{
  "Year": String,
  "Month": String
}

I want to accomplish the following tasks -

  1. I want to run distinct queries like

    db.col.distinct('Month', {Year: '2016'})
    

    I have a compound index on {'Year': 1, 'Month': 1}, so by intuition looks like the answer can be computed by looking at indices only. The query performance on a collection in millions is really poor right now. Is there any way that this could be done?

  2. I want to find the latest month of a given year. One way is to sort the result of the above distinct query and take the first element as the answer. A much better and faster solution as pointed out by @ Blakes Seven in the discussion below, would be to use the query db.col.find({ "Year": "2016"}).sort({ "Year": -1, "Month": -1 }).limit(1)
hyades
  • 3,110
  • 1
  • 17
  • 36
  • Why are you running any kind of aggregation on this when the result is just going to be at most 12 possible values? The only point could be to find the missing gaps. So if this is just historical analysis then I would just say *"live with it"* and if you have lots of data it will take lots of time. If this is an ongoing "application design point", then you would be better to "pre-aggregate" the data in a single document per "year" just insert "months" as created. That way you would know if a "month" was skipped. – Blakes Seven Apr 01 '16 at 21:10
  • If the year is ongoing, then I like to find the latest month (in my use case this isn't necessarily the current month going on :) ). I could do a $addToSet, but that is just too many update operations going on. @BlakesSeven – hyades Apr 01 '16 at 21:18
  • The *"latest month"* is certainly not determined by `.distinct()` nor would it be by `$addToSet` since "sets" are un-ordered. You are not explaining things as clearly as you think and I am offering you "fresh eyes" on the problem, if only you would only explain what you really need to do rather than what you "think" you need to do. "Fresh eyes" often see things differently to the tunnel vision you develop when you are just trying to make something work. Stop telling us how you think you need to solve it. Describe the problem instead. – Blakes Seven Apr 01 '16 at 21:34
  • Haha :) I am finding the latest month by sorting the results of the distinct array. My actual problem involves Year, Month, Day, Hour , which would get solved if the Year, Month gets solved. I need the latest value of the last dimension, to figure out the latest data point. – hyades Apr 01 '16 at 21:42
  • What is wrong with `db.col.find({ "Year": "2016"}).sort({ "Year": -1, "Month": -1 }).limit(1)` ? Always returns the latest sorted. For that matter use a BSON `Date` instead, as it's a lot more compact than "strings". Seriously, one last time. **Edit** your question. Distinct is way off track and your question is presently completely unclear. Present upvotes are "lucK" rather than a job well done. Give your question some love and take the time to explain the problem properly. No that I think there really is a better way than just fecthing the single largest result from a sort. – Blakes Seven Apr 01 '16 at 21:52
  • Not to mention that if you actually did `"Month": "1"` as opposed to `"Month": "01"` then you are doing yourself a great harm since that won't sort lexically. Use numeric or date datatypes instead. – Blakes Seven Apr 01 '16 at 21:54
  • I'm not sure the MongoDB optimizer is able to see that your query could be resolved entirely within the index, without accessing the real collection. Anyway, such an index is in general poor, because too many documents matches the same key _(year, month)_: it's the classical situation where even an advanced RDBMS optimizer would never use it. Given the whole picture, my advice is to add a BSON data field to the collection and index it. Otherwise, just sort back and get last document. – dbra Apr 02 '16 at 07:30
  • @BlakesSeven updated question. Thanks for the suggestion. Works good. Only problem is that I would need to revisit this in case I ever need the 2nd latest and so. Btw, I am storing it in padded form, so '01' < '02' – hyades Apr 02 '16 at 07:46

0 Answers0