3

I have a mongodb/mongoskin aggregation request as follows:

db.collection(customerTable + '_earnedinfluencers').aggregate([
    {
        $group: {
            _id: '$user',
            name: '', // to be filled separately
            username: '', // to be filled separately 
            picture: '', // to be filled separately
            city: '', // to be filled separately
            kids: { $sum: '$kids' },
            revenue: { $sum: '$dayIncome' },
            kidsRevRatio: { $divide: [ { $sum: '$kids' }, { $sum: '$dayIncome' } ] }
        }, 

        $match: {
            richness: { $gte: variable1 },
            kids: { $lt: variable2 },
            hobbies: { $in: ['hobby1', 'hobby2', 'hobby3', 'hobby4'] },
            event: { $in: schoolfestival },
            event: { $ne: 0 }
        },

        $project: {
            _id: 0,
            user: '$_id',
            name: 1,
            username: 1,
            picture: 1,
            city: 1,
            kids: 1,
            revenue: 1,
            kidsRevRatio: 1
        }
    }
], function(err, result) {
    // do something with err and result
});

The above code gives the following error:

Error: {"name":"MongoError","errmsg":"exception: A pipeline stage specification object must contain exactly one field.","code":16435,"ok":0}

I'm new to mongo and db in general, and can't tell what I did wrong.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
jeebface
  • 841
  • 1
  • 12
  • 29
  • Each operation (e.g. `$group`, `$match`, `$project`) should be passed as a separate object, you can't combine them all into one object. – Leonid Beschastny Aug 18 '14 at 23:04
  • You can find examples here: http://docs.mongodb.org/manual/applications/aggregation/ – Leonid Beschastny Aug 18 '14 at 23:06
  • looks like i misread the example. thanks for pointing that out. but now i'm getting "exception: the group aggregate field 'name' must be defined as an expression inside an object" error... – jeebface Aug 18 '14 at 23:10

1 Answers1

11

Your pipeline arguments are unbalanced, each stage is a separate document so you need to wrap each one. But there are also a number of other problems

db.collection(customerTable + '_earnedinfluencers').aggregate([

    { $match: {
            richness: { $gte: variable1 },
            kids: { $lt: variable2 },
            hobbies: { $in: ['hobby1', 'hobby2', 'hobby3', 'hobby4'] },
            event: { $in: schoolfestival },
    }},

    { $group: {
            _id: '$user',
            name: { '$first': '$name' },
            username: { '$first': '$username' },
            picture: { '$first': '$picture' },
            city: { '$first': '$city' }
            kids: { '$sum': '$kids' },
            revenue: { '$sum': '$dayIncome' },
            kidsSum: { '$sum': '$kids' },
    }}, 


    { $project: {
            _id: 0,
            user: '$_id',
            name: 1,
            username: 1,
            picture: 1,
            city: 1,
            revenue: 1,
            kidsSum: 1,
            kidsRevRatio: { $divide: [ '$kidsSum', '$revenue' ] }
    }}
], function(err, result) {
    // do something with err and result
});

You had the whole pipeline as one document where it in fact requires an array of documents as shown.

But really you want to $match first in order to filter your results. If you really want to do some additional matching after a group then you add in an additional match to the pipeline afterwards.

The $group operation requires that all fields outside of the _id grouping key need to have a "grouping operator" you cannot just get fields back by their own unless they are part of the _id you group on. Typically you want an operator such as $first or otherwise omit them completely.

Top level grouping operators only, so operations like $divide are not a grouping operator. In order to to do this sort of thing when you are working with one or more $sum results you move this to a later $project using the fields with the calculated values.

Also operations like project and group, "removes" fields from the pipeline and only preserves those you specifically include. So you cannot specify a field that is not there. This is one reason why the $match comes first as well as that is where you can use an index, and only ever at the start of the pipeline can this be done.

But for each stage the only fields present will be those mentioned. As a further optimization the "optimizer" will from the start not include any fields present in your document that are not specifically mentioned. So only those referenced in the first match and group stages combined will be included for the rest of the pipline stages, and then possibly filtered down again.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • A commenter earlier pointed this out as well and the error seems to be gone, only to be replaced by another error: "exception: the group aggregate field 'name' must be defined as an expression inside an object". any idea what this is about? i don't see any name being declared outside of an object. – jeebface Aug 18 '14 at 23:39
  • @jeebface There were actually quite a few things wrong and I was in the middle of typing and also answering another question. There is a full explaination here now. – Neil Lunn Aug 18 '14 at 23:58