3

I have a collection with documents like below:

{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-05T00:00:00Z")},
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")},
{startDate: ISODate("2016-01-07T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")}

I would like to return a record for every date between the minimum startDate and the maximum endDate. Along with each of these records I would like to return a count of the number of records where the startDate and endDate contain this date.

So for my above example the min startDate is 1/2/2016 and the max endDate is 1/10/2016 so I would like to return all dates between those two along with the counts. See desired output below:

{date: ISODate("2016-01-02T00:00:00Z"), count: 2}
{date: ISODate("2016-01-03T00:00:00Z"), count: 2}
{date: ISODate("2016-01-04T00:00:00Z"), count: 2}
{date: ISODate("2016-01-05T00:00:00Z"), count: 4}
{date: ISODate("2016-01-06T00:00:00Z"), count: 3}
{date: ISODate("2016-01-07T00:00:00Z"), count: 4}
{date: ISODate("2016-01-08T00:00:00Z"), count: 4}
{date: ISODate("2016-01-09T00:00:00Z"), count: 2}
{date: ISODate("2016-01-010T00:00:00Z"), count: 2}

Please let me know if this doesn't make sense and I can try to explain in more detail.

I am able to do this using a loop like below:

var startDate = ISODate("2016-01-02T00:00:00Z")
var endDate = ISODate("2016-02-10T00:00:00Z")
while(startDate < endDate){
  var counts = db.data.find(
      {
        startDate: {$lte: startDate},
        endDate: {$gte: startDate}
      }
    ).count()
  print(startDate, counts)
  startDate.setDate(startDate.getDate() + 1)
}

But i'm wondering if there is a way to do this using the aggregation framework? I come from a mostly SQL background where looping to get data is often a bad idea. Does this same rule apply for MongoDB? Should I be concerned about using looping here and try to use the aggregation framework or is this a valid solution?

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486

2 Answers2

1

Your best bet here is mapReduce. This is because you can loop values in between "startDate" and "endDate" within each document and emit for each day ( or other required interval ) between those values. Then it is just a matter of accumulating per emitted date key from all data:

db.collection.mapReduce(
    function() {
        for( var d = this.startDate.valueOf(); d <= this.endDate.valueOf(); d += 1000 * 60 * 60 * 24 ) {
            emit(new Date(d), 1)
        }
    },
    function(key,values) {
        return Array.sum(values);
    },
    { "out": { "inline": 1 } }
)

This produces results like this:

{
        "results" : [
                {
                        "_id" : ISODate("2016-01-02T00:00:00Z"),
                        "value" : 2
                },
                {
                        "_id" : ISODate("2016-01-03T00:00:00Z"),
                        "value" : 2
                },
                {
                        "_id" : ISODate("2016-01-04T00:00:00Z"),
                        "value" : 2
                },
                {
                        "_id" : ISODate("2016-01-05T00:00:00Z"),
                        "value" : 4
                },
                {
                        "_id" : ISODate("2016-01-06T00:00:00Z"),
                        "value" : 3
                },
                {
                        "_id" : ISODate("2016-01-07T00:00:00Z"),
                        "value" : 4
                },
                {
                        "_id" : ISODate("2016-01-08T00:00:00Z"),
                        "value" : 4
                },
                {
                        "_id" : ISODate("2016-01-09T00:00:00Z"),
                        "value" : 2
                },
                {
                        "_id" : ISODate("2016-01-10T00:00:00Z"),
                        "value" : 2
                }
        ],
        "timeMillis" : 35,
        "counts" : {
                "input" : 5,
                "emit" : 25,
                "reduce" : 9,
                "output" : 9
        },
        "ok" : 1
}

Your dates are rounded to a day in the sample, but if they were not in real data then it is just a simple matter of date math to be applied in order to round per interval.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
0

In mongodb aggregate framework there are stages instead of loop. It is a pipeline and it goes through each stage until it reaches the last stage specified. That is why you see a [] when using aggregate framework. there are several stages, to name a few (match, group and project). Take a look at their document it is quite simple. anyways that was very brief. As for your question here is my proposition:

I have not tried this. If you can try this and let me know if it works:

First you only keep those with dates in the range you desire using $match. Then follow that with the $group stage. Example:

db.collection.aggregate{
    [
         {$match: {
             $and : [
                   {startDate: {$gte:ISODate("2016-01-02T00:00:00Z")}, 
                   {endDate: {$lte:ISODate("2016-02-10T00:00:00Z")}

                    ]
         },

         {$group:
             {_id: {startDate:"$startDate",endDate:"$endDate"},
              count:{$sum:1}
             }
          }
    ]
 }

If you want to just group using startDate as in you example replace

_id: {startDate:"$startDate",endDate:"$endDate"

with this:

_id: "$startDate"

I hope that helps

ibininja
  • 1,209
  • 1
  • 15
  • 29
  • This is related but doesn't quite return the data I need. Basically it returns a distinct start/end pair and how many other records have that start/end pair. Again - similar to what I need but not quite the same. – Abe Miessler Feb 08 '16 at 18:31