3

In mongodb, I have a collection that contains a single document that looks like the following:

{ 
    "_id" : ObjectId("5552b7fd9e8c7572e36e39df"), 
    "StackSummaries" : [
        {
            "StackId" : "arn:aws:cloudformation:ap-southeast-2:406119630047:stack/XXXX-30fb22a-285-439ee279-c7c8d36/4ebd8770-f8f4-11e4-bf36-503f2370240f", 
            "TemplateDescription" : "XXXX", 
            "StackStatusReason" : "", 
            "CreationTime" : "2015-05-12T22:14:50.535Z", 
            "StackName" : "XXXX", 
            "StackStatus" : "CREATE_COMPLETE"
        }, 
        {
            "TemplateDescription" : "XXXX", 
            "StackStatusReason" : "", 
            "CreationTime" : "2015-05-11T04:02:05.543Z", 
            "StackName" : "XXXX", 
            "StackStatus" : "DELETE_COMPLETE", 
            "StackId" : "arn:aws:cloudformation:ap-southeast-2:406119630047:stack/XXXXX/7c8d04e0-f792-11e4-bb12-506726f15f9a"
        },
        { ... },
        { many others }
    ]
}

ie the imported results of the aws cli command aws cloudformation list-stacks

I'm trying to find the items of the StackSummaries array that have a StackStatus of CREATE_COMPLETE or UPDATE_COMPLETE. After much experimenting and reading other SO posts I arrived at the following:

db.cf_list_stacks.aggregate( {$match: {"StackSummaries.StackStatus": "CREATE_COMPLETE"}})

However this still returns the whole document (and I haven't even worried about UPDATE_COMPLETE).

I'm coming from an SQL background and struggling with simple queries like this. Any ideas on how to get the information I'm looking for?

SO posts I've looked at:

Update

Notes on things I learned while understanding this topic:

  • aggregate() is just a pipeline (like a Unix shell pipeline) where each $ operator is just another step. And like shell pipelines they can look complex, but you just build them up step by step until you get the results you want
  • Mongo has a great webinar: Exploring the Aggregation Framework
  • RoboMongo is a good tool (GPL3) for working with Mongo data and queries
Community
  • 1
  • 1
Sonia Hamilton
  • 4,229
  • 5
  • 35
  • 50
  • 1
    Do you only what sub-document that match the given criteria? What is the expected output in your case? – styvane May 13 '15 at 06:41
  • Hi @Michael there's only one document in the collection, so I want to "transform" the document so that the StackSummaries array only contains elements that have a StackStatus of CREATE_COMPLETE or UPDATE_COMPLETE. Probably the other answers will help me, I've just woken up and yet to understand them fully. – Sonia Hamilton May 13 '15 at 20:45

2 Answers2

4

If you only want the object inside the StackSummaries array, you should use the $unwind clause to expand the array, filter the documents you want and then project only the parts of the document that you actually want.

The query would look something like this:

db.cf_list_stacks.aggregate([
    { '$unwind' : '$StackSummaries' },
    { '$match' : { 'StackSummaries.StackStatus' : 'CREATE_COMPLETE' } },
    { '$project' : { 
         'TemplateDescription' : '$StackSummaries.TemplateDescription',
         'StackStatusReason' : '$StackSummaries.StackStatusReason',
         ...
    } }
])

Useful links:

Santiago Alessandri
  • 6,630
  • 30
  • 46
  • Thanks @SanSS and Chridam for your great answers, I really appreciate it. I've upvoted both, it will take me a while to understand the answers before I can vote on a "correct" answer (though they're probably both correct). – Sonia Hamilton May 13 '15 at 20:42
3

With MongoDB 3.4 and newer, you can leverage the $addFields and $filter operators with the aggregation framework to get the desired result.

Consider running the following pipeline:

db.cf_list_stacks.aggregate([
    {
        "$addFields": {
            "StackSummaries": {
                "$filter": {
                    "input": "$StackSummaries",
                    "as": "el":
                    "cond": {
                        "$in": [ 
                            "$$el.StackStatus", 
                            ["CREATE_COMPLETE", "UPDATE_COMPLETE"] 
                        ] 
                    }
                }
            }
        }
    }
]);

For MongoDB 3.2

db.cf_list_stacks.aggregate([
    {
        "$project": {
            "StackSummaries": {
                "$filter": {
                    "input": "$StackSummaries",
                    "as": "el":
                    "cond": {
                        "$or": [
                            { "$eq": ["$$el.StackStatus", "CREATE_COMPLETE"] },
                            { "$eq": ["$$el.StackStatus", "UPDATE_COMPLETE"] }
                        ]
                    }
                }
            }
        }
    }
]);

For MongoDB 3.0 and below

db.cf_list_stacks.aggregate([
    { "$unwind": "$StackSummaries" },
    {
        "$match": {            
            "StackSummaries.StackStatus": {
                "$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
            }
        }        
    },
    {
        "$group": {
            "_id": "$_id",
            "StackSummaries": {
                "$addToSet": "$StackSummaries"
            }
        }
    }
])

The above pipeline has the $unwind operator which deconstructs the StackSummaries array field from the input documents to output a document for each element. Each output document replaces the array with an element value.

A further filtering is required after the $unwind to get only the documents that pass the given criteria thus a second $match operator pipeline stage follows.

In order to get the original array field after doing the $unwind bit, you would need to group the documents using the $group operator and within the group you can then use the $addToSet array operator to then push the elements into the array.


Based on the criteria that you are trying to find the items of the StackSummaries array that have a StackStatus of CREATE_COMPLETE OR UPDATE_COMPLETE, you could use $elemMatch projection but this won't work with the $in operator as required to get the document with StackStatus of CREATE_COMPLETE OR UPDATE_COMPLETE at this time. There is a JIRA issue for this:

db.cf_list_stacks.find(
    {
        "StackSummaries.StackStatus": {
            "$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
        }
    }, 
    {
        "StackSummaries": {
            "$elemMatch": {
                "StackStatus": {
                    "$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
                }
            }            
        }    
    })

This will only give you documents where the StackStatus has the "CREATE_COMPLETE" value.

chridam
  • 100,957
  • 23
  • 236
  • 235
  • 1
    Thanks SanSS and @Chridam for your great answers, I really appreciate it. I've upvoted both, it will take me a while to understand the answers before I can vote on a "correct" answer (though they're probably both correct). – Sonia Hamilton May 13 '15 at 20:42
  • @chridam I think there is an extra `$match` operation in your aggregation pipeline. She explicitly says that the collection has a single document, so the first `$match` of your aggregation won't save any actual work. – Santiago Alessandri May 14 '15 at 16:23