1

I have the following documents in my collection. Each document contains historical weather data about a specific location:

{
'location':'new york', 
'history':[
    {'timestamp':1524542400, 'temp':79, 'wind_speed':1, 'wind_direction':'SW'}
    {'timestamp':1524548400, 'temp':80, 'wind_speed':2, 'wind_direction':'SW'}
    {'timestamp':1524554400, 'temp':82, 'wind_speed':3, 'wind_direction':'S'}
    {'timestamp':1524560400, 'temp':78, 'wind_speed':4, 'wind_direction':'S'}
    ]
},
{
'location':'san francisco', 
'history':[
    {'timestamp':1524542400, 'temp':80, 'wind_speed':5, 'wind_direction':'SW'}
    {'timestamp':1524548400, 'temp':81, 'wind_speed':6, 'wind_direction':'SW'}
    {'timestamp':1524554400, 'temp':82, 'wind_speed':7, 'wind_direction':'S'}
    {'timestamp':1524560400, 'temp':73, 'wind_speed':8, 'wind_direction':'S'}
    ]
},
{
'location':'miami', 
'history':[
    {'timestamp':1524542400, 'temp':84, 'wind_speed':9, 'wind_direction':'SW'}
    {'timestamp':1524548400, 'temp':85, 'wind_speed':10, 'wind_direction':'SW'}
    {'timestamp':1524554400, 'temp':86, 'wind_speed':11, 'wind_direction':'S'}
    {'timestamp':1524560400, 'temp':87, 'wind_speed':12, 'wind_direction':'S'}
    ]
}

I would like to get a list of the most recent weather data for each location (more or less) like so:

{
'location':'new york', 
'history':{'timestamp':1524560400, 'temp':78, 'wind_speed':4, 'wind_direction':'S'}
},
{
'location':'san francisco', 
'history':{'timestamp':1524560400, 'temp':73, 'wind_speed':8, 'wind_direction':'S'}
},
{
'location':'miami', 
'history':{'timestamp':1524560400, 'temp':87, 'wind_speed':12, 'wind_direction':'S'}
}

I was pretty sure it needed some sort of $group aggregate but can't figure out how to select an entire object by $max:<field>. For example the below query only returns the max timestamp itself, without any of the accompanying fields.

db.collection.aggregate([{
    '$unwind': '$history'
}, {
    '$group': {
        '_id': '$name',
        'timestamp': {
            '$max': '$history.timestamp'
        }
    }
}])

returns

{ "_id" : "new york", "timestamp" : 1524560400 }
{ "_id" : "san franciscoeo", "timestamp" : 1524560400 }
{ "_id" : "miami", "timestamp" : 1524560400 }

The actual collection and arrays are very large so client side processing won't be ideal. Any help would be much appreciated.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Ben Lamm
  • 603
  • 8
  • 18

1 Answers1

6

Well as the author of the answer you found, I think we can actually do a bit better with modern MongoDB versions.

Single match per document

In short we can actually apply $max to your particular case, used with $indexOfArray and $arrayElemAt to extract the matched value:

db.collection.aggregate([
  { "$addFields": {
    "history": {
      "$arrayElemAt": [
        "$history",
        { "$indexOfArray": [ "$history.timestamp", { "$max": "$history.timestamp" } ] }
      ]
    }
  }}
])

Which will return you:

{
        "_id" : ObjectId("5ae9175564de8a00a66b3974"),
        "location" : "new york",
        "history" : {
                "timestamp" : 1524560400,
                "temp" : 78,
                "wind_speed" : 4,
                "wind_direction" : "S"
        }
}
{
        "_id" : ObjectId("5ae9175564de8a00a66b3975"),
        "location" : "san francisco",
        "history" : {
                "timestamp" : 1524560400,
                "temp" : 73,
                "wind_speed" : 8,
                "wind_direction" : "S"
        }
}
{
        "_id" : ObjectId("5ae9175564de8a00a66b3976"),
        "location" : "miami",
        "history" : {
                "timestamp" : 1524560400,
                "temp" : 87,
                "wind_speed" : 12,
                "wind_direction" : "S"
        }
}

That is of course without actually needing to "group" anything and simply find the $max value from within each document, as you seem to be trying to do. This avoids you needing to "mangle" any other document output by forcing it through a $group or indeed an $unwind.

The usage essentially is that the $max returns the "maximum" value from the specified array property since $history.timestamp is a short way of notating to extract "just those values" from within the objects of the array.

This is used in comparison with the same "list of values" to determine the matching "index" via $indexOfArray, which takes an array as it's first argument and the value to match as the second.

The $arrayElemAt operator also takes an array as it's first argument, here we use the full "$history" array since we want to extract the "full object". Which we do by the "returned index" value of the $indexOfArray operator.

"Multiple" matches per document

Of course that's fine for "single" matches, but if you wanted to expand that to "multiple" matches of the same $max value, then you would use $filter instead:

db.collection.aggregate([
  { "$addFields": {
    "history": {
      "$filter": {
        "input": "$history",
        "cond": { "$eq": [ "$$this.timestamp", { "$max": "$history.timestamp" } ] }
      }
    }
  }}
])

Which would output:

{
        "_id" : ObjectId("5ae9175564de8a00a66b3974"),
        "location" : "new york",
        "history" : [
                {
                        "timestamp" : 1524560400,
                        "temp" : 78,
                        "wind_speed" : 4,
                        "wind_direction" : "S"
                }
        ]
}
{
        "_id" : ObjectId("5ae9175564de8a00a66b3975"),
        "location" : "san francisco",
        "history" : [
                {
                        "timestamp" : 1524560400,
                        "temp" : 73,
                        "wind_speed" : 8,
                        "wind_direction" : "S"
                }
        ]
}
{
        "_id" : ObjectId("5ae9175564de8a00a66b3976"),
        "location" : "miami",
        "history" : [
                {
                        "timestamp" : 1524560400,
                        "temp" : 87,
                        "wind_speed" : 12,
                        "wind_direction" : "S"
                }
        ]
}

The main difference being of course that the "history" property is still an "array" since that is what $filter will produce. Also noting of course that if there were in fact "multiple" entries with the same timestamp value, then this would of course return them all and not just the "first index" matched.

The comparison is basically done instead against "each" array element to see if the "current" ( "$$this" ) object has the specified property which matches the $max result, and ultimately returning only those array elements which are a match for the supplied condition.


These are essentially your "modern" approaches which avoid the overhead of $unwind, and indeed $sort and $group where they may not be needed. Of course they are not needed for just dealing with individual documents.

If however you really need to $group across "multiple documents" by a specific grouping key and consideration of values "inside" the array, then the initial approach outlined as you discovered is actually the fit for that scenario, as ultimately you "must" $unwind to deal with items "inside" an array in such a way. And also with consideration "across documents".

So be mindful to use stages like $group and $unwind only where you actually need to and where "grouping" is your actual intent. If you are just looking to find something "in the document", then there are far more efficient ways to do this without all the additional overhead that those stages bring with them to processing.

Community
  • 1
  • 1
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317