0

When using $group, is there any way to use a weighted average instead of the $avg group accumulator? I used this (How do I calculate a weighted average in mongoDB using aggregation framework?) for the weighted average but the error I get is "The field 'id:WAVG_A' must specify one accumulator". Is there some sort of hack to use an projection instead of a group accumulator?

A subset of my aggregation JSON is here:

{
   "$group":{
      "_id":{
         "id:DIM_1":"$id:DIM_1",
         "id:DIM_2":"$id:DIM_2"
      },
      "id:WAVG_A":{
         "$group":{
            "_id":"weighted average",
            "nu":{
               "$sum":{
                  "$multiply":[
                     "$id:WAVG_A",
                     "$id:MET_A"
                  ]
               }
            },
            "de":{
               "$sum":"$id:MET_A"
            }
         },
         "$project":{
            "average":{
               "$divide":[
                  "$nu",
                  "$de"
               ]
            }
         }
      },
      "id:MET_A":{
         "$sum":"$id:MET_A"
      }
   }
}

Edit

To simplify things, let me try and explain my problem using data similar to the samples provided on the mongo site. Let’s say you have data like:

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "weight" : 1, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "weight" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "weight" : 2, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "weight" : 3, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "weight" : 3, "date" : ISODate("2014-04-04T21:23:13.331Z") }

Instead of using $avg when aggregating the data, you want to use a weighted average (https://support.microsoft.com/en-ca/help/214049/how-to-calculate-weighted-averages-in-excel). If natively supported, it might look like:

db.sales.aggregate(
   [
      {
        $group : {
           _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },
           totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
           averageQuantity: { $weighted_avg: { $value : "$quantity", $weight: "$weight" },
           count: { $sum: 1 }
        }
      }
   ]
)

The idea is that when you average it, you have both the average and a weight. Weighted averages are supported in Mongo, because it supports mathematical operations like multiply and division as per How do I calculate a weighted average in mongoDB using aggregation framework?. But what I can't figure out is how would I use a weighted average when aggregating data. How do I apply the answer from How do I calculate a weighted average in mongoDB using aggregation framework? when I want to use $group. The problem from that answer is that it is a $group and a $project and you can't seem to use that as the accumulator of a $group.

Randar Puust
  • 363
  • 4
  • 14
  • Considering that the issue is you have the syntax completely incorrect, then a "subset" is really not the best way to describe your problem. Instead you should provide some source data, your expected result and explanation of how you expect those results to be obtained. You cannot nest a `$group` within a `$group`, so instead of "part of an attempt" show what needs to happen instead. – Neil Lunn Oct 13 '17 at 22:58

1 Answers1

0

So I have a solution to my problem. I was trying to do too much at once. My solution was to break up the group and project as two separate steps, while putting the numerator and denominator into temporary values and it worked.

{
   "$group":{
      "_id":{
         "id:DIM_1":"$id:DIM_1",
         "id:DIM_2":"$id:DIM_2"
      },
      "id:WAVG_A??num":{
         "$sum":{
            "$multiply":[
               "$id:WAVG_A",
               "$id:MET_A"
            ]
         }
      },
      "id:WAVG_A??den":{
         "$sum":"$id:MET_A"
      },
      "id:MET_A":{
         "$sum":"$id:MET_A"
      }
   }
}

and then

{
   "$project":{
      "id:DIM_1":"$_id.id:DIM_1",
      "id:DIM_2":"$_id.id:DIM_2",
      "id:WAVG_A":{
         "$cond":[
            {
               "$eq":[
                  "$id:WAVG_A??den",
                  0
               ]
            },
            0,
            {
               "$divide":[
                  "$id:WAVG_A??num",
                  "$id:WAVG_A??den"
               ]
            }
         ]
      },
      "id:MET_A":"$id:MET_A"
   }
Randar Puust
  • 363
  • 4
  • 14
  • I've also submitted a feature request to have weighted average added as an accumulator to Mongo. https://jira.mongodb.org/browse/SERVER-31594?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel – Randar Puust Oct 20 '17 at 14:59