0

I'm attempting to group the items in a collection by year/month/day. The grouping should be based on the pubDate and pubTimezoneOffset.

I've got an aggregate pipeline that:

- $project - adds the timezoneOffset to the pubDate   
- $group - groups by the modified pubDate 
- $project - removes the timezoneOffset
- $sort - sorts by pubDate

I tested each stage on it's own and it seems to be some issue with the second $project. In the final output the pubDate is null.

I've been going over it for a few hours now and can't see where I've gone wrong. What am I missing?

The aggregate pipeline:

db.messages.aggregate([
    {
      $project: {
        _id: 1,
        pubTimezoneOffset: 1,
        pubDate: {
          $add: [
            '$pubDate', {
              $add: [
                { $multiply: [ '$pubTimezoneOffset.hours', 60, 60, 1000 ] },
                { $multiply: [ '$pubTimezoneOffset.minutes', 60, 1000 ] }
              ]
            }
          ]
        }
      }
    },
    { 
      $group: {
        _id: {
          year: { $year: '$pubDate' },
          month: { $month: '$pubDate' },
          day: { $dayOfMonth: '$pubDate' }
        },
        count: { $sum: 1 },
        messages: {
          $push: {
            _id: '$_id',
            pubTimezoneOffset: '$pubTimezoneOffset',
            pubDate: '$pubDate'
          }      
        }
      }
    },
    {
      $project: {
        _id: 1,
        messages: {
          _id: 1,
          pubTimezoneOffset: 1,
          pubDate: {
            $subtract: [
              '$pubDate', {
                $add: [
                  { $multiply: [ '$pubTimezoneOffset.hours', 60, 60, 1000 ] },
                  { $multiply: [ '$pubTimezoneOffset.minutes', 60, 1000 ] }
                ]
              }
            ]
          }
        },
        count: 1
      }  
    },
    {
      $sort: {
        '_id.year': -1,
        '_id.month': -1,
        '_id.day': -1
      }
    }
]).pretty();

To recreate the source data:

    db.messages.insertOne({ 
      pubDate: ISODate('2017-10-25T10:00:00:000Z'),
      pubTimezoneOffset: {
        hours: -7,
        minutes: 0
      }
    });

    db.messages.insertOne({
      pubDate: ISODate('2017-10-25T11:00:00:000Z'),
      pubTimezoneOffset: {
        hours: -7,
        minutes: 0
      }
    });

    db.messages.insertOne({
      pubDate: ISODate('2017-10-24: 10:00:00:000Z'),
      pubTimezoneOffset: {
        hours: -7,
        minutes: 0
      }
    });

    db.messages.insertOne({
      pubDate: ISODate('2017-10-24: 11:00:00:000Z'),
      pubTimezoneOffset: {
        hours: -7,
        minutes: 0
      }
    });

Running it in mongo shell outputs:

{
    "_id" : {
        "year" : 2017,
        "month" : 10,
        "day" : 25
    },
    "count" : 2,
    "messages" : [
        {
            "_id" : ObjectId("59f0e8b47d0a206bdfde87b3"),
            "pubTimezoneOffset" : {
                "hours" : -7,
                "minutes" : 0
            },
            "pubDate" : null
        },
        {
            "_id" : ObjectId("59f0e8b47d0a206bdfde87b4"),
            "pubTimezoneOffset" : {
                "hours" : -7,
                "minutes" : 0
            },
            "pubDate" : null
        }
    ]
}
{
    "_id" : {
        "year" : 2017,
        "month" : 10,
        "day" : 23
    },
    "count" : 2,
    "messages" : [
        {
            "_id" : ObjectId("59f0e8b47d0a206bdfde87b5"),
            "pubTimezoneOffset" : {
                "hours" : -7,
                "minutes" : 0
            },
            "pubDate" : null
        },
        {
            "_id" : ObjectId("59f0e8b47d0a206bdfde87b6"),
            "pubTimezoneOffset" : {
                "hours" : -7,
                "minutes" : 0
            },
            "pubDate" : null
        }
    ]
}
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • What you are missing is you converted the `Date` to "year", "month" and "day" numbers. So your next "date math" expression expecting a `Date` returns `null`. This is the wrong approach anyway. Timezone has no business being "stored", and should come from the "current locale" of the client for conversion instead. See [Group by Date with Local Time Zone in MongoDB](https://stackoverflow.com/a/45093686/2313887) for the general considerations and handling, notably "keeping and rounding a BSON Date". – Neil Lunn Oct 25 '17 at 21:28
  • FYI MongoDB 3.6 adds support for Timezones into date aggregation operators. – Neil Lunn Oct 25 '17 at 21:29

1 Answers1

0

Kudos for the attempt but, you actually have quite a few things conceptually incorrect here, with the basic error you are seeing is because your premise of "array projection" is incorrect. You are trying to refer to variables "inside the array" by simply notating the "property name".

What you actually need to do here is apply $map in order to apply the functions to "transform" each element:

db.messages.aggregate([
  { "$project": {
    "pubTimezoneOffset": 1,
    "pubDate": {
      "$add": [
        "$pubDate",
        { "$add": [
          { "$multiply": [ '$pubTimezoneOffset.hours', 60 * 60 * 1000 ] },
          { "$multiply": [ '$pubTimezoneOffset.minutes', 60 * 1000 ] }
        ]}
      ]
    }
  }},
  { "$group": {
    "_id": {
      "year": { "$year": "$pubDate" },
      "month": { "$month": "$pubDate" },
      "day": { "$dayOfMonth": "$pubDate" }
    },
    "count": { "$sum": 1 },
    "messages": {
      "$push": {
        "_id": "$_id",
        "pubTimezoneOffset": "$pubTimezoneOffset",
        "pubDate": "$pubDate"
      }      
    }
  }},
  { "$project": {
    "messages": {
      "$map": {
        "input": "$messages",
        "as": "m",
        "in": {
          "_id": "$$m._id",
          "pubTimezoneOffset": "$$m.pubTimezoneOffset",
          "pubDate": {
            "$subtract": [
              "$$m.pubDate",
              { "$add": [
                { "$multiply": [ "$$m.pubTimezoneOffset.hours", 60 * 60 * 1000 ] },
                { "$multiply": [ "$$m.pubTimezoneOffset.minutes", 60 * 1000 ] }
              ]}
            ]
          }
        }
      }
    },
    "count": 1
  }},
  { "$sort": { "_id": -1 } }
]).pretty();

Noting here that you are doing a lot of unnecessary work in "tranforming" the dates kept in the array, and then trying to "tranform" them back to the original state. Instead, you should have simply supplied a "variable" with $let to the _id of $group and left the original document state "as is" using $$ROOT instead of naming all the fields:

db.messages.aggregate([
  { "$group": {
    "_id": {
      "$let": {
        "vars": {
          "pubDate": {
            "$add": [
              "$pubDate",
              { "$add": [
                { "$multiply": [ '$pubTimezoneOffset.hours', 60 * 60 * 1000 ] },
                { "$multiply": [ '$pubTimezoneOffset.minutes', 60 * 1000 ] }
              ]}
            ]
          }
        },
        "in": {
          "year": { "$year": "$$pubDate" },
          "month": { "$month": "$$pubDate" },
          "day": { "$dayOfMonth": "$$pubDate" }
        }   
      }
    },
    "docs": { "$push": "$$ROOT" }      
  }},
  { "$sort": { "_id": -1 } }
])

Also note that $sort simply does actually consider all the "sub-keys" anyway, so there is no need to name them explicitly.

Back to your error, the point of $map is essentially because whilst you can notate array "field inclusion" with MongoDB 3.2 and above like this:

"messages": {
  "_id": 1,
  "pubTimeZoneOffset": 1
}

The thing you cannot do is actually "calculate values" on the elements themselves. You tried "$pubDate" which actually looks in the "ROOT" space for a property of that name, which does not exist and is null. If you then tried:

"messages": {
  "_id": 1,
  "pubTimeZoneOffset": 1,
  "pubDate": "$messages.pubDate"
}

Then you would get "a result", but not the result you might think. Because what would actually be included in "every element" is the value of that property in each array element as a "new array" itself.

So the short and sweet is use $map instead, which iterates the array elements with a local variable referring to the current element for you to notate values for in expressions.


MongoDB 3.6

MongoDB date operators are all timezone aware. So instead of all the juggling then all you need do is supply the additional "timezone" parameter to any option and the conversion will be done for you.

As a sample:

db.messages.aggregate([
  { "$group": {
    "_id": {
      "$dateToString": {
        "date": "$pubDate",
        "format": "%Y-%m-%d",
        "timezone": {
          "$concat": [
            { "$cond": {
              "if": { "$gt": [ "$pubTimezoneOffset", 0 ] },
              "then": "+",
              "else": "-"
            }},
            { "$let": {
              "vars": {
                "hours": { "$substr": [{ "$abs": "$pubTimezoneOffset.hours" },0,2] },
                "minutes": { "$substr": [{ "$abs": "$pubTimezoneOffset.minutes" },0,2] }
              },
              "in": {
                "$concat": [
                  { "$cond": {
                    "if": { "$eq": [{ "$strLenCP": "$$hours" }, 1 ] },
                    "then": { "$concat": [ "0", "$$hours" ] },
                    "else": "$$hours"
                  }},
                  ":",
                  { "$cond": {
                    "if": { "$eq": [{ "$strLenCP": "$$minutes" }, 1 ] },
                    "then": { "$concat": [ "0", "$$minutes" ] },
                    "else": "$$minutes"
                  }}
                ]
              }
            }}
          ]
        }
      }
    },
    "docs": { "$push": "$$ROOT" }
  }},
  { "$sort": { "_id": -1 } }
])

Note that most of the "juggling" in there is to convert your own "offset" to the "string" format required by the new operators. If you simply stored this as "offset": "-07:00" then you can instead simply write:

db.messages.aggregate([
  { "$group": {
    "_id": {
      "$dateToString": {
        "date": "$pubDate",
        "format": "%Y-%m-%d",
        "timezone": "$offset"
      }
    },
    "docs": { "$push": "$$ROOT" }
  }},
  { "$sort": { "_id": -1 } }
])

Please Reconsider

I can't let this pass without making a note that your general approach here is conceptually incorrect. Storing "offset" or "local time string" within the database is just intrinsically wrong.

The date information should be stored as UTC and should be returned as UTC. Sure you can and "should" covert when aggregating, but the general premise is that you always convert back to UTC. And "conversion" comes from the "locale of the observer" and not a "stored" adjustment. Because dates are always relative to the "observer" point of view, and are not from the "point of origin" as you seem to have interpreted it.

I put some lengthy detail on this on Group by Date with Local Time Zone in MongoDB about why you store this way and why "locale" conversion from the "observer" is necessary. That also details "Daylight savings considerations" from the observer point of view.

The basic premise there still remains the same when MongoDB becomes "timezone aware" in that you :

  1. Store in UTC
  2. Query with local time converted to UTC
  3. Aggregate converted from the "observer" offset
  4. Convert the "offset" back to UTC

Because at the end of the day it's the "clients" job to supply that "locale" conversion, since that's the part that "knows where it is".

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Do you think my approach is conceptually correct for a use case where the data needs to be presented using the user's timezone rather than the observer's? – user3381520 Oct 26 '17 at 16:22
  • @user3381520 If you have a new question then [Ask a new Question](https://stackoverflow.com/questions/ask). This answers the question you actually asked. correcting your application of methods on array members. – Neil Lunn Oct 26 '17 at 21:15