14

I'm developing a simple financial app for keeping track of incomes and outcomes.

For the sake of simplicity, let's suppose these are some of my documents:

{ description: "test1", amount: 100, dateEntry: ISODate("2015-01-07T23:00:00Z") }
{ description: "test2", amount: 50,  dateEntry: ISODate("2015-01-06T23:00:00Z") }
{ description: "test3", amount: 11,  dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test4", amount: 2,   dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test5", amount: 12,  dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test6", amount: 4,   dateEntry: ISODate("2015-01-09T23:00:00Z") }

What I would like now is to draw a "balance" chart, based on such data:

{ day: "2015-01-06", amount: 50  }
{ day: "2015-01-07", amount: 150 }
{ day: "2015-01-09", amount: 179 }

In other words, I need to group all my transactions by day, and for each day I need to sum all of my previous transactions (since the beginning of the world).

I already know how to group by day:

$group: {
   _id: { 
      y: {$year:"$dateEntry"}, 
      m: {$month:"$dateEntry"}, 
      d: {$dayOfMonth:"$dateEntry"} 
   }, 
   sum: ???
}

But I don't know how to go back and sum all the amounts.

Imagine I need to show a monthly balance report: should I run 31 queries, one for each day summing all transaction's amount except next days? Sure I can, but don't think that's the best solution.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
Fabio B.
  • 9,138
  • 25
  • 105
  • 177
  • I don't know how to exactly write the query, Neil. If I simply did {$sum:"$amount"} only the current group transactions would be included in my result, and I don't want so. Please read the question more carefully ;) – Fabio B. Jan 17 '15 at 01:23
  • Sorry. Maybe I can't explain what I exactly want, seriously. I don't want a simple sum of all amounts for each day. I want to sum all the amounts of ALL the days before each day. In my example, 2015-01-07 would be the sum of 6-Jan and 7-Jan amounts, 2015-01-09 would be the sum of 6,7,9 Jan amounts, and so on. – Fabio B. Jan 17 '15 at 01:29
  • why is my question down voted? I thought it would be an interesting one... – Fabio B. Jan 17 '15 at 01:32
  • I really cannot understand... I thought that grouping by day and summing would give me the sum of each day transactions, not the "add up" you're talking about. I mean at 6 I would sum only the 6 transactions, 7 only the 7 and so on. – Fabio B. Jan 17 '15 at 01:42

2 Answers2

14

Actually more suited to mapReduce than the aggregation framework, at least in the initial problem solving. The aggregation framework has no concept of the value of a previous document, or the previous "grouped" value of a document so this is why it cannot do this.

On the other hand, mapReduce has a "global scope" that can be shared between stages and documents as they are processed. This will get you the "running total" for the current balance at end of day you require.

db.collection.mapReduce(
  function () {
    var date = new Date(this.dateEntry.valueOf() -
      ( this.dateEntry.valueOf() % ( 1000 * 60 * 60 * 24 ) )
    );

    emit( date, this.amount );
  },
  function(key,values) {
      return Array.sum( values );
  },
  { 
      "scope": { "total": 0 },
      "finalize": function(key,value) {
          total += value;
          return total;
      },
      "out": { "inline": 1 }
  }
)      

That will sum by date grouping and then in the "finalize" section it makes a cumulative sum from each day.

   "results" : [
            {
                    "_id" : ISODate("2015-01-06T00:00:00Z"),
                    "value" : 50
            },
            {
                    "_id" : ISODate("2015-01-07T00:00:00Z"),
                    "value" : 150
            },
            {
                    "_id" : ISODate("2015-01-09T00:00:00Z"),
                    "value" : 179
            }
    ],

In the longer term you would be best of having a separate collection with an entry for each day an alter the balance using $inc in an update. Just also do an $inc upsert at the beginning of each day to create a new document carrying forward the balance from the previous day:

// increase balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": amount } },
    { "upsert": true }
);

// decrease balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": -amount } },
    { "upsert": true }
);

// Each day
var lastDay = db.daily.findOne({ "dateEntry": lastDate });
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": lastDay.balance } },
    { "upsert": true }
);

How NOT to do this

Whilst it is true that since the original writing there are more operators introduced to the aggregation framework, what is being asked here is still not practical to do in an aggregation statement.

The same basic rule applies that the aggregation framework cannot reference a value from a previous "document", nor can it store a "global variable". "Hacking" this by coercion of all results into an array:

db.collection.aggregate([
  { "$group": {
    "_id": { 
      "y": { "$year": "$dateEntry" }, 
      "m": { "$month": "$dateEntry" }, 
      "d": { "$dayOfMonth": "$dateEntry" } 
    }, 
    "amount": { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } },
  { "$group": {
    "_id": null,
    "docs": { "$push": "$$ROOT" }
  }},
  { "$addFields": {
    "docs": {
      "$map": {
        "input": { "$range": [ 0, { "$size": "$docs" } ] },
        "in": {
          "$mergeObjects": [
            { "$arrayElemAt": [ "$docs", "$$this" ] },
            { "amount": { 
              "$sum": { 
                "$slice": [ "$docs.amount", 0, { "$add": [ "$$this", 1 ] } ]
              }
            }}
          ]
        }
      }
    }
  }},
  { "$unwind": "$docs" },
  { "$replaceRoot": { "newRoot": "$docs" } }
])

That is neither a performant solution or "safe" considering that larger result sets run the very real probability of breaching the 16MB BSON limit. As a "golden rule", anything that proposes to put ALL content within the array of a single document:

{ "$group": {
  "_id": null,
  "docs": { "$push": "$$ROOT" }
}}

then that is a basic flaw and therefore not a solution.


Conclusion

The far more conclusive ways to handle this typically would be post processing on the running cursor of results:

var globalAmount = 0;

db.collection.aggregate([
  { $group: {
    "_id": { 
      y: { $year:"$dateEntry"}, 
      m: { $month:"$dateEntry"}, 
      d: { $dayOfMonth:"$dateEntry"} 
    }, 
    amount: { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } }
]).map(doc => {
  globalAmount += doc.amount;
  return Object.assign(doc, { amount: globalAmount });
})

So in general it's always better to:

  • Use cursor iteration and a tracking variable for totals. The mapReduce sample is a contrived example of the simplified process above.

  • Use pre-aggregated totals. Possibly in concert with cursor iteration depending on your pre-aggregation process, whether that is just interval total or a "carried forward" running total.

The aggregation framework should really be used for "aggregating" and nothing more. Forcing coercions on data via processes like manipulating into an array just to process how you want is neither wise or safe, and most importantly the client manipulation code is far cleaner and more efficient.

Let databases do the things they are good at, as you "manipulations" are far better handled in code instead.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • 1
    Thanks Neil, The update to the answer is exactly what I have done. Being new to MongoDB (started this friday only), I was looking for "modern" solutions. I only have one question. I am doing bulk insert to a collection, so that aggregation works only on delta (newer) data. Is there a performance impact I should be aware of. – alok ailawadi Mar 25 '19 at 05:07
6

Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:

// { day: "2015-01-06", "amount": 50 }
// { day: "2015-01-07", "amount": 100 }
// { day: "2015-01-09", "amount": 11 }
db.collection.aggregate([
  { $setWindowFields: {
    sortBy: { day: 1 },
    output: {
      cumulative: {
        $sum: "$amount",
        window: { documents: [ "unbounded", "current" ] }
      }
    }
  }}
])
// { day: "2015-01-06", amount: 50,  cumulative: 50 }
// { day: "2015-01-07", amount: 100, cumulative: 150 }
// { day: "2015-01-09", amount: 11,  cumulative: 161 }

This:

  • adds the cumulative field in each document (output: { cumulative: { ... }})
  • which is the $sum of amounts ($sum: "$amount")
  • on a specified span of documents (the window)
    • which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } } in the collection.
    • as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).
  • Also note that we've made sure to sort documents by day (sortBy: { day: 1 }).

And here is the full query for your exact question (using an initial $group to group your documents by day with the sum of their amounts):

// { date: ISODate("2015-01-06T23:00:00Z"), "amount": 50 },
// { date: ISODate("2015-01-07T23:00:00Z"), "amount": 100 },
// { date: ISODate("2015-01-09T23:00:00Z"), "amount": 11 },
// { date: ISODate("2015-01-09T23:00:00Z"), "amount": 2 }
db.collection.aggregate([
  { $group: {
    _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
    "amount": { "$sum": "$amount" } }
  },
  { $setWindowFields: {
    sortBy: { _id: 1 },
    output: {
      cumulative: {
        $sum: "$amount",
        window: { documents: [ "unbounded", "current" ] }
      }
    }
  }}
])
// { _id: "2015-01-06", amount: 50,  cumulative: 50 }
// { _id: "2015-01-07", amount: 100, cumulative: 150 }
// { _id: "2015-01-09", amount: 13,  cumulative: 163 }
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190