0

I have a mongo collection with 7 million documents, besides a couple of other fields each document has a 'createdAt' Date object. I have an index 'createdAt:1' on the field and it's hosted at a dedicated mongo service.

When I try to group by day the query gets real slow. Here is my aggregation query:

{
    "$match": {
        "createdAt": {
            $gte:new Date(1472189560111)
        }
    }
}, 
{
    "$project": {
        "date": 
        {
            "$dateToString": {
                "format": "%Y-%m-%d",
                "date": "$createdAt"
            }
        },
        "count": 1
    }
}, 
{
    "$group": {
        "_id": "$date",
        "count": {
            "$sum": 1
        }
    }
}, 
{
    "$sort": {
        "_id": 1
    }
}, 
{
    "$project": {
        "date": "$_id",
        "count": 1,
        "_id": 0
    }
}

What's a good strategy to improve the performance? Is there a problem in my aggregation pipeline? Do I need a field that contains the day date object with a fixed time like 00:00 and group on that? It seems such a basic operation that I believe there has to be a mongodb native way of doing that.

Stan Wiechers
  • 1,962
  • 27
  • 45
  • 1
    Is there actually an index on the `createdAt` field? That will improve performance quite a lot. Also don't use a separate `$project` stage, as this effectively passes through all matching data and rewrites the field. Instead do your `$dateToString` in the `$group` directly, where it will compact as it goes. Better yet, don't use a string and use ["Date Math"](https://stackoverflow.com/a/26814496/2313887) instead. Uses less space. Which is why you use a `Date` object and not a string in the first place. Also, useless `$project` at the end of the pipeline. All newbie errors really. – Neil Lunn Jul 19 '17 at 07:36
  • https://gist.github.com/whoisstan/ba896ca69945549dea18e8bfdfd12296 Much faster indeed, the project phase was indeed useless. Any more improvements you can suggest? I do have an index. – Stan Wiechers Jul 19 '17 at 08:13
  • 1
    That's pretty much as optimized as you get ["without getting a bigger boat".](http://www.urbandictionary.com/define.php?term=We%27re%20Gonna%20Need%20a%20Bigger%20Boat). You still have not confirmed whether an index is present or not. Is it? So the two things to do are 1. Use an index for selection. 2. Don't include unneeded extra processing. After that, it's all about the "size of results", and if it's a lot of things to "crunch" through, then it takes some time. Hence then a "hardware" problem. – Neil Lunn Jul 19 '17 at 08:50
  • Great, I do have an index. – Stan Wiechers Jul 19 '17 at 10:31

0 Answers0