Riiiight. I came up with the following query. Admittedly, I have seen simpler and nicer ones in my life but it certainly gets the job done:
db.getCollection('test').aggregate
(
{
$facet: // split aggregation into two pipelines
{
"in": [
{ "$match": { "inDate": { "$ne": null } } }, // get rid of null values
{ $group: { "_id": { "y": { "$year": "$inDate" }, "m": { "$month": "$inDate" }, "d": { "$dayOfMonth": "$inDate" } }, "cIn": { $sum : 1 } } }, // compute sum per inDate
],
"out": [
{ "$match": { "outDate": { "$ne": null } } }, // get rid of null values
{ $group: { "_id": { "y": { "$year": "$outDate" }, "m": { "$month": "$outDate" }, "d": { "$dayOfMonth": "$outDate" } }, "cOut": { $sum : 1 } } }, // compute sum per outDate
]
}
},
{ $project: { "result": { $setUnion: [ "$in", "$out" ] } } }, // merge results into new array
{ $unwind: "$result" }, // unwind array into individual documents
{ $replaceRoot: { newRoot: "$result" } }, // get rid of the additional field level
{ $group: { _id: { year: "$_id.y", "month": "$_id.m", "day": "$_id.d" }, "countIn": { $sum: "$cIn" }, "countOut": { $sum: "$cOut" } } } // group into final result
)
As always with MongoDB aggregations you can get an idea of what's going on by simply reducing the projection stages step by step starting from the end of the query.
EDIT:
As you can see in the comments below there was a bit of a discussion around document size limits and the general applicability of this solution.
So let's look at those aspects in greater detail and let's also compare the performance of the $facet
based solution to the one based on $map
(suggested by @NeilLunn to avoid potential document size issues).
I created 2 million test records that have random dates assigned to both the "inDate" and the "outDate" field:
{
"_id" : ObjectId("597857e0fa37b3f66959571a"),
"inDate" : ISODate("2016-07-29T22:00:00.000Z"),
"outDate" : ISODate("1988-07-14T22:00:00.000Z")
}
The data range covered was from 01.01.1970 all the way to 01.01.2050, that's a total of 29220 distinct days. Given the random distribution of the 2 million test records across this time range both queries can be expected to return the full 29220 possible results (which both did).
Then I ran both queries five times after restarting my single MongoDB instance freshly and the results in milliseconds I got looked like this:
$facet
: 5663, 5400, 5380, 5460, 5520
$map
: 9648, 9134, 9058, 9085, 9132
I also measured the size of the single document returned by the facet stage which was 3.19MB so reasonably far away from the MongoDB document size limit (16MB at the time of writing) which, however, only applies to the result document anyway and wouldn't be a problem during pipeline processing.
Bottom line: If you want performance, use the solution suggested here. Be careful about the document size limit, though, in particular if your use case is not the exact one described in the question above (e.g. when you need to collect even more/bigger data). Also, I am not sure if in a sharded scenario both solutions still expose the same performance characteristics...