0

I'm new to mongoDB, the collection I'm working with was constructed by a python file that accesses my gmail using the gmail API.I'm trying to make a query to find the time of each day of the week with the highest number of received emails. Here was my attempt:

db.gmail_data2.aggregate(
[{$unwind: '$payload.headers' },
{$match:{'payload.headers.name':'Date'}},
{$project:{email_hour:{$hour:"payload.headers.value"}}},
{$group:{_id:"$email_hour",total:{$max:{$sum:1}}}}])

Here is what an email object looks like:

{
 "id": string,
 "threadId": string,
 "labelIds": [
  string
],
"snippet": string,
"historyId": unsigned long,
"internalDate": long,
"payload": {
 "partId": string,
 "mimeType": string,
 "filename": string,
 "headers": [
  {
    "name": string,
    "value": string
  }
],
"body": users.messages.attachments Resource,
"parts": [
  (MessagePart)
]
},
"sizeEstimate": integer,
"raw": bytes
}

The date of the email is in the payload.headers portion of the object where "name" is date and "value" is the date in ISODate format. The query works without the $max command and gives the number of emails per hour out of all emails. Once I put the $max command it out puts: { "_id" : hour, "total" : 1 } for every hour.

Hamon148
  • 137
  • 1
  • 12

1 Answers1

1

you'll need to do something like this. First $group stage to group the emails by date and hour while counting emails for each hour and the final group to group by date and pick the max email count for each day while pushing the email count by hour into an array. $project stage to $filter the max email hour row by comparing the previous email max count with the array's count and return the matching value. You can add another project stage at the end if you would like to only get the hour.

aggregate(
    [{
        $unwind: '$payload.headers'
    }, {
        $match: {
            'payload.headers.name': 'Date'
        }
    }, {
        $group: {
            _id: {
                email_date: {
                    $dateToString: {
                        format: "%Y-%m-%d",
                        date: '$payload.headers.value'
                    }
                },
                email_hour: {
                    $hour: '$payload.headers.value'
                }
            },
            count: {
                $sum: 1
            }
        }
    }, {
        $group: {
            _id: '$_id.email_date',
            email_by_hour: {
                $push: {
                    email_hour: '$_id.email_hour',
                    count: '$count'
                }
            },
            max_count: {
                $max: '$count'
            }
        }
    }, {
        $project: {
            _id: 0,
            email_date: '$_id',
            max_email_hour: {
                $filter: {
                    input: '$email_by_hour',
                    as: 'item',
                    cond: {
                        $eq: ['$$item.count', '$max_count']
                    }
                }
            }
        }
    }])

Sample Output:

{ "email_date" : "2016-11-21", "max_email_hour" : [ { "email_hour" : 1, "count" : 3 } ] }
{ "email_date" : "2016-11-20", "max_email_hour" : [ { "email_hour" : 12, "count" : 2 } ] }
s7vr
  • 73,656
  • 11
  • 106
  • 127