0

I have a collection with documents like the following:

{
    name: "Johann",
    surname: "Esburg",
    jobs: [
       {
           "profession": "chef",
           "salary": 1000
       }, {
           "profession": "gardener",
           "salary": 800
       }
    ]
},
{
    name: "Sam",
    surname: "Sonite",
    jobs: [
       {
           "profession": "doctor",
           "salary": 2000
       }, {
           "profession": "barber",
           "salary": 850
       }
    ]
}

I would like to find all the jobs that have a salary greater than 900, So that the result would be:

[
    {
        "profession": "chef",
        "salary": 1000
    }, {
        "profession": "doctor",
        "salary": 2000
    }
]

I'm quite sure that I must resort to mongodb aggregation. The best I managed to achieve is:

db.persons.aggregate([
    {$unwind: "$jobs"},
    {$match: {"jobs.salary": {$gt: 900}}},
    {$project: {jobs: 1, _id: 0}}])

That returns:

[
    {
        "jobs": {
            "profession": "chef",
            "salary": 1000
         }
    },
    {
        "jobs": {
            "profession": "doctor",
            "salary": 2000
        }
    }
]

But this is not what I want. I want also the jobs key to be removed. I know one possibility is to specify each variable of the embedded document in the projection like this:

db.persons.aggregate([
    {$unwind: "$jobs"},
    {$match: {"jobs.salary": {$gt: 900}}},
    {$project:
        {"profession": "$jobs.profession",
         "salary": "$jobs.salary", _id: 0}}])

but I would prefer to avoid it, since the embedded documents could differ on some fields.

tano
  • 836
  • 1
  • 10
  • 25
  • 1
    group by `_id:null` like this `{"$unwind":"$jobs"},{"$match":{"jobs.salary": {$gt: 900}}},{"$group":{"_id":null,"results":{"$push":"$jobs"}}}` – Neo-coder Oct 07 '15 at 10:24
  • @Yogesh It is a good approximation of what I wanted. Thanks – tano Oct 07 '15 at 10:31

1 Answers1

0

Consider the following aggregation pipeline operation, the $match pipeline as the first step is necessary to ensure an efficient operation as the aggregation can use an index as well as query the collection to minimise the number of documents that enter the pipeline:

db.persons.aggregate([
    {
        "$match": {
            "jobs.salary": { "$gt": 900 }
        }
    },
    {
        "$unwind": "$jobs"
    }, 
    {
        "$match": {
            "jobs.salary": { "$gt": 900 }
        }
    },   
    {
        "$group": { 
            "_id": null,
            "jobs": {
                "$push": {
                    "profession": "$jobs.profession",
                    "salary": "$jobs.salary"
                }
            }
        }
    },
    {
        "$unwind": "$jobs"
    },
    {
        "$project": {
            "_id": 0, "profession": "$jobs.profession", "salary": "$jobs.salary"
        }
    }
])

Sample Output:

/* 0 */
{
    "result" : [ 
        {
            "profession" : "chef",
            "salary" : 1000
        }, 
        {
            "profession" : "doctor",
            "salary" : 2000
        }
    ],
    "ok" : 1
}
chridam
  • 100,957
  • 23
  • 236
  • 235
  • The result is correct, but it is equivalent to the second solution I exposed in my question. The problem is that I would avoid to specify each field of the result explicitly, that is what you do in the `group` statement. – tano Oct 08 '15 at 09:56
  • @tano The result above is equivalent to the expected outcome you specified in your question, I'm failing to see how that is similar to the undesired second solution. – chridam Oct 08 '15 at 10:00
  • The problem is not the result. The fact is that the object I used in the question is just an example one. My real objects are more complex, they can change over time and some of them can have some fields missing. Therefore it is inconvenient to specify the fields one by one like you do in the `group` and in the `project` statements. – tano Oct 08 '15 at 10:14