12

Sample Documents:

{ time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
{ time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
{ time: ISODate("2013-10-11T19:12:66Z"), value: 3 }
{ time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
{ time: ISODate("2013-10-12T04:15:38Z"), value: 5 }

It's easy to get the aggregated results that is grouped by date. But what I want is to query results that returns a running total of the aggregation, like:

{ time: "2013-10-10" total: 3, runningTotal: 3  }
{ time: "2013-10-11" total: 7, runningTotal: 10 }
{ time: "2013-10-12" total: 5, runningTotal: 15 }

Is this possible with the MongoDB Aggregation?

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
mongonzaga
  • 121
  • 1
  • 3
  • 1
    Can you keep a running total as you go along. That would be the easiest and most efficient, especially since the data isn't changing. The aggregation framework would a pretty expensive way to calculate this sort of static data on the fly. – cirrus Apr 24 '13 at 12:13
  • 1
    there is no way to do this with aggregation framework currently. – Asya Kamsky Apr 24 '13 at 13:13
  • @cirrus Thanks for the answer. I'm not quite sure how do that though... – mongonzaga Apr 24 '13 at 13:23
  • Well I think it would involve several aggregated queries. You couldn't do it in one command. But by calculating as you went along it simply means adding an other field to every entry to keep track of the running total. Depending on your app you could do this as you write data, or you could run a background task to calculate it at the end of each day. But this assumes you're writing the data as you go along, I don't know where your data came from. If the data is already there you'll have to run a query for each day and store that somewhere else. – cirrus Apr 24 '13 at 17:18

4 Answers4

11

EDIT: Since MongoDB v5.0 the prefered approach would be to use the new $setWindowFields aggregation stage as shared by Xavier Guihot.


This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.

db.temp.aggregate(
    [
        {
            '$group': {
                '_id': '$time',
                'total': { '$sum': '$value' }
            }
        },
        {
            '$sort': {
                 '_id': 1
            }
        },
        {
            '$group': {
                '_id': 0,
                'time': { '$push': '$_id' },
                'totals': { '$push': '$total' }
            }
        },
        {
            '$unwind': {
                'path' : '$time',
                'includeArrayIndex' : 'index'
            }
        },
        {
            '$project': {
                '_id': 0,
                'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' }  },
                'total': { '$arrayElemAt': [ '$totals', '$index' ] },
                'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
            }
        },
    ]
);

I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.

Sam Tolmay
  • 269
  • 3
  • 11
3

Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:

// { time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
// { time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
// { time: ISODate("2013-10-11T12:12:66Z"), value: 3 }
// { time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
// { time: ISODate("2013-10-12T05:15:38Z"), value: 5 }
db.collection.aggregate([

  { $group: {
    _id: { $dateToString: { format: "%Y-%m-%d", date: "$time" } },
    total: { $sum: "$value" }
  }},
  // e.g.: { "_id" : "2013-10-11", "total" : 7 }

  { $set: { "date": "$_id" } }, { $unset: ["_id"] },
  // e.g.: { "date" : "2013-10-11", "total" : 7 }

  { $setWindowFields: {
    sortBy: { date: 1 },
    output: {
      running: {
        $sum: "$total",
        window: { documents: [ "unbounded", "current" ] }
      }
    }
  }}
])
// { date: "2013-10-11", total: 7, running: 7 }
// { date: "2013-10-10", total: 3, running: 10 }
// { date: "2013-10-12", total: 5, running: 15 }

Let's focus on the $setWindowFields stage that:

  • chronologically $sorts grouped documents by date: sortBy: { date: 1 }
  • adds the running field in each document (output: { running: { ... }})
  • which is the $sum of totals ($sum: "$total")
  • on a specified span of documents (the window)
    • which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } }
    • as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
2

here is another approach

pipeline

db.col.aggregate([
    {$group : {
        _id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
        value : {$sum : "$value"}
    }},
    {$addFields : {_id : "$_id.time"}},
    {$sort : {_id : 1}},
    {$group : {_id : null, data : {$push : "$$ROOT"}}},
    {$addFields : {data : {
        $reduce : {
            input : "$data",
            initialValue : {total : 0, d : []},
            in : {
                total : {$sum : ["$$this.value", "$$value.total"]},                
                d : {$concatArrays : [
                        "$$value.d",
                        [{
                            _id : "$$this._id",
                            value : "$$this.value",
                            runningTotal : {$sum : ["$$value.total", "$$this.value"]}
                        }]
                ]}
            }
        }
    }}},
    {$unwind : "$data.d"},
    {$replaceRoot : {newRoot : "$data.d"}}
]).pretty()

collection

> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }

result

{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
> 
Saravana
  • 12,647
  • 2
  • 39
  • 57
2

Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)

Calculating running totals is as simple as:

db.collection1.aggregate(
[
  {
    $lookup: {
      from: 'collection1',
      let: { date_to: '$time' },
      pipeline: [
        {
          $match: {
            $expr: {
              $lt: [ '$time', '$$date_to' ]
            }
          }
        },
        {
          $group: {
            _id: null,
            summary: {
              $sum: '$value'
            }
          }
        }
      ],
      as: 'sum_prev_days'
    }
  },
  {
    $addFields: {
      sum_prev_days: {
        $arrayElemAt: [ '$sum_prev_days', 0 ]
      }
    }
  },
  {
    $addFields: {
      running_total: {
        $sum: [ '$value', '$sum_prev_days.summary' ]
      }
    }
  },
  {
    $project: { sum_prev_days: 0 }
  }
]
)

What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.

If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.

db.collection1.aggregate(
[
  {
    $group: {
      _id: {
        $substrBytes: ['$time', 0, 10]
      },
      value: {
        $sum: '$value'
      }
    }
  },
  {
    $lookup: {
      from: 'collection1',
      let: { date_to: '$_id' },
      pipeline: [
        {
          $group: {
            _id: {
              $substrBytes: ['$time', 0, 10]
            },
            value: {
              $sum: '$value'
            }
          }
        },
        {
          $match: {
            $expr: {
              $lt: [ '$_id', '$$date_to' ]
            }
          }
        },
        {
          $group: {
            _id: null,
            summary: {
              $sum: '$value'
            }
          }
        }
      ],
      as: 'sum_prev_days'
    }
  },
  {
    $addFields: {
      sum_prev_days: {
        $arrayElemAt: [ '$sum_prev_days', 0 ]
      }
    }
  },
  {
    $addFields: {
      running_total: {
        $sum: [ '$value', '$sum_prev_days.summary' ]
      }
    }
  },
  {
    $project: { sum_prev_days: 0 }
  }
]
)

The result is:

{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }