66

I have some docs in mongo that looks something like this:

{
  _id : ObjectId("..."),
  "make" : "Nissan",
  ..
},
{
  _id : ObjectId("..."),
  "make" : "Nissan",
  "saleDate" :  ISODate("2013-04-10T12:39:50.676Z"),
  ..
}

Ideally, I'd like to be able to count, by make, the number of vehicles sold per day. I'd then like to view either today, or a window such as today through the last seven days.

I was able to accomplish the daily view with some ugly code

db.inventory.aggregate(
  { $match : { "saleDate" : { $gte: ISODate("2013-04-10T00:00:00.000Z"), $lt: ISODate("2013-04-11T00:00:00.000Z")  } } } ,
  { $group : { _id : { make : "$make", saleDayOfMonth : { $dayOfMonth : "$saleDate" } }, cnt : { $sum : 1 } } }
)

Which then yields the results

{
  "result" : [
    {
      "_id" : {
        "make" : "Nissan",
        "saleDayOfMonth" : 10
      },
      "cnt" : 2
    },
    {
      "_id" : {
        "make" : "Toyota",
        "saleDayOfMonth" : 10
      },
      "cnt" : 4
    },
  ],
  "ok" : 1
}

So that is ok, but I would much prefer to not have to change the two datetime values in the query. Then, as I mentioned above, I'd like to be able to run this query (again, without having to modify it each time) and see the same results binned by day over the last week.

Oh and here is the sample data I've been using for the query

db.inventory.save({"make" : "Nissan","saleDate" :  ISODate("2013-04-10T12:39:50.676Z")});
db.inventory.save({"make" : "Nissan"});
db.inventory.save({"make" : "Nissan","saleDate" :  ISODate("2013-04-10T11:39:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-09T11:39:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:38:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:37:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:36:50.676Z")});
db.inventory.save({"make" : "Toyota","saleDate" :  ISODate("2013-04-10T11:35:50.676Z")});

Thanks in advance, Kevin

Kevin
  • 1,420
  • 2
  • 13
  • 11
  • What's the meaning of 676Z? – Aboozar Rajabi Mar 18 '17 at 09:04
  • Just an update (from 2017, wow this question is old..), I've changed the accepted answer as the Mongo framework has evolved to solve this much easier. Nonetheless props to Asya for her original answer. @AboozarRajabi, "676Z" is an optional part of an [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) time format, in this case "676Z" represents two parts of 2013-04-10T11:35:50.676Z, first 676 is milliseconds, and "Z" is a shortcut for denoting the UTC timezone. – Kevin Mar 19 '17 at 20:16
  • Actually, it's evolved even further and as of 3.6 (released in 2017) you no longer need to convert the date to string to do this, not to mention that I just noticed none of us included the second part of your question, which is how to view "window such as today through the last seven days" - last seven days, especially including days which had no sales is a bit different than the base part of this question. – Asya Kamsky Apr 08 '18 at 15:10
  • I'm updating my answer based on 3.6 and including how to add dates on which nothing happened (0 sales). – Asya Kamsky Apr 08 '18 at 16:00
  • @Kevin you really ought to reconsider reverting the accepted answer. – Asya Kamsky Apr 01 '20 at 22:56

3 Answers3

102

In Mongo 2.8 RC2 there is a new data aggregation operator: $dateToString which can be used to group by a day and simply have a "YYYY-MM-DD" in the result:

Example from the documentation:

db.sales.aggregate(
  [
     {
         $project: {
                yearMonthDay: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
                time: { $dateToString: { format: "%H:%M:%S:%L", date: "$date" } }
         }
     }
  ]
)

will result in:

{ "_id" : 1, "yearMonthDay" : "2014-01-01", "time" : "08:15:39:736" }
ephigenia
  • 1,035
  • 1
  • 8
  • 4
  • 6
    According to documentation, $dateToString returns a string, so you may loose some possibility you'll have with date object – Constantin Guay Dec 02 '15 at 09:43
  • 3
    Sorry for the bump, but how does time zones play into this? If I read this right this will will group by UTC yearMonthDay. Any idea how to take zones into account? – Sivli Nov 20 '16 at 19:30
  • as of 3.6 there is support for timezone conversion in various date operators. – Asya Kamsky Mar 28 '18 at 16:58
  • instead of $project, you can use $addFields then create a new field in the document in the pipeline having just the date without time. This can be useful for $group on the date - i.e., find the last record of the day, etc. Future pipeline operators can continue to use the original field 'date' for sorting, etc. with no loss in fidelity. – barrypicker Oct 24 '19 at 22:04
62

UPDATE The updated answer is based on date features in 3.6 as well as showing how to include dates in the range which had no sales (which wasn't mentioned in any original answers including mine).

Sample data:

db.inventory.find()
{ "_id" : ObjectId("5aca30eefa1585de22d7095f"), "make" : "Nissan", "saleDate" : ISODate("2013-04-10T12:39:50.676Z") }
{ "_id" : ObjectId("5aca30eefa1585de22d70960"), "make" : "Nissan" }
{ "_id" : ObjectId("5aca30effa1585de22d70961"), "make" : "Nissan", "saleDate" : ISODate("2013-04-10T11:39:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70962"), "make" : "Toyota", "saleDate" : ISODate("2013-04-09T11:39:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70963"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:38:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70964"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:37:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70965"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:36:50.676Z") }
{ "_id" : ObjectId("5aca30effa1585de22d70966"), "make" : "Toyota", "saleDate" : ISODate("2013-04-10T11:35:50.676Z") }
{ "_id" : ObjectId("5aca30f9fa1585de22d70967"), "make" : "Toyota", "saleDate" : ISODate("2013-04-11T11:35:50.676Z") }
{ "_id" : ObjectId("5aca30fffa1585de22d70968"), "make" : "Toyota", "saleDate" : ISODate("2013-04-13T11:35:50.676Z") }
{ "_id" : ObjectId("5aca3921fa1585de22d70969"), "make" : "Honda", "saleDate" : ISODate("2013-04-13T00:00:00Z") }

Defining startDate and endDate as variables and using them in aggregation:

startDate = ISODate("2013-04-08T00:00:00Z");
endDate = ISODate("2013-04-15T00:00:00Z");

db.inventory.aggregate([
  { $match : { "saleDate" : { $gte: startDate, $lt: endDate} } },
  {$addFields:{
     saleDate:{$dateFromParts:{
                  year:{$year:"$saleDate"},
                  month:{$month:"$saleDate"},
                  day:{$dayOfMonth:"$saleDate"}
     }},
     dateRange:{$map:{
        input:{$range:[0, {$subtract:[endDate,startDate]}, 1000*60*60*24]},
        in:{$add:[startDate, "$$this"]}
     }}
  }},
  {$unwind:"$dateRange"},
  {$group:{
     _id:"$dateRange", 
     sales:{$push:{$cond:[
                {$eq:["$dateRange","$saleDate"]},
                {make:"$make",count:1},
                {count:0}
     ]}}
  }},
  {$sort:{_id:1}},
  {$project:{
     _id:0,
     saleDate:"$_id",
     totalSold:{$sum:"$sales.count"},
     byBrand:{$arrayToObject:{$reduce:{
        input: {$filter:{input:"$sales",cond:"$$this.count"}},
        initialValue: {$map:{input:{$setUnion:["$sales.make"]}, in:{k:"$$this",v:0}}}, 
        in:{$let:{
           vars:{t:"$$this",v:"$$value"},
           in:{$map:{
              input:"$$v",
              in:{
                 k:"$$this.k",
                 v:{$cond:[
                     {$eq:["$$this.k","$$t.make"]},
                     {$add:["$$this.v","$$t.count"]},
                     "$$this.v"
                 ]}
              }
           }}
        }}
     }}}
  }}
])

On sample data this gives results:

{ "saleDate" : ISODate("2013-04-08T00:00:00Z"), "totalSold" : 0, "byBrand" : {  } }
{ "saleDate" : ISODate("2013-04-09T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-10T00:00:00Z"), "totalSold" : 6, "byBrand" : { "Nissan" : 2, "Toyota" : 4 } }
{ "saleDate" : ISODate("2013-04-11T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-12T00:00:00Z"), "totalSold" : 0, "byBrand" : {  } }
{ "saleDate" : ISODate("2013-04-13T00:00:00Z"), "totalSold" : 2, "byBrand" : { "Honda" : 1, "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-14T00:00:00Z"), "totalSold" : 0, "byBrand" : {  } }

This aggregation can also be done with two $group stages and a simple $project instead of $group and a complex $project. Here it is:

db.inventory.aggregate([
   {$match : { "saleDate" : { $gte: startDate, $lt: endDate} } },
   {$addFields:{saleDate:{$dateFromParts:{year:{$year:"$saleDate"}, month:{$month:"$saleDate"}, day:{$dayOfMonth : "$saleDate" }}},dateRange:{$map:{input:{$range:[0, {$subtract:[endDate,startDate]}, 1000*60*60*24]},in:{$add:[startDate, "$$this"]}}}}},
   {$unwind:"$dateRange"},
   {$group:{
      _id:{date:"$dateRange",make:"$make"},
      count:{$sum:{$cond:[{$eq:["$dateRange","$saleDate"]},1,0]}}
   }},
   {$group:{
      _id:"$_id.date",
      total:{$sum:"$count"},
      byBrand:{$push:{k:"$_id.make",v:{$sum:"$count"}}}
   }},
   {$sort:{_id:1}},
   {$project:{
      _id:0,
      saleDate:"$_id",
      totalSold:"$total",
      byBrand:{$arrayToObject:{$filter:{input:"$byBrand",cond:"$$this.v"}}}
   }}
])

Same results:

{ "saleDate" : ISODate("2013-04-08T00:00:00Z"), "totalSold" : 0, "byBrand" : { "Honda" : 0, "Toyota" : 0, "Nissan" : 0 } }
{ "saleDate" : ISODate("2013-04-09T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Honda" : 0, "Nissan" : 0, "Toyota" : 1 } }
{ "saleDate" : ISODate("2013-04-10T00:00:00Z"), "totalSold" : 6, "byBrand" : { "Honda" : 0, "Toyota" : 4, "Nissan" : 2 } }
{ "saleDate" : ISODate("2013-04-11T00:00:00Z"), "totalSold" : 1, "byBrand" : { "Toyota" : 1, "Honda" : 0, "Nissan" : 0 } }
{ "saleDate" : ISODate("2013-04-12T00:00:00Z"), "totalSold" : 0, "byBrand" : { "Toyota" : 0, "Nissan" : 0, "Honda" : 0 } }
{ "saleDate" : ISODate("2013-04-13T00:00:00Z"), "totalSold" : 2, "byBrand" : { "Honda" : 1, "Toyota" : 1, "Nissan" : 0 } }
{ "saleDate" : ISODate("2013-04-14T00:00:00Z"), "totalSold" : 0, "byBrand" : { "Toyota" : 0, "Honda" : 0, "Nissan" : 0 } }

Original Answer based on 2.6:

You might want to take a look at my blog entry about how to deal with various date manipulations in Aggregation Framework here.

What you can do is use $project phase to truncate your dates to daily resolution and then run the aggregation over the whole data set (or just part of it) and aggregate by date and make.

With your sample data, say you want to know how many vehicles you sold by make, by date this year:

match={"$match" : {
               "saleDate" : { "$gt" : new Date(2013,0,1) }
      }
};

proj1={"$project" : {
        "_id" : 0,
        "saleDate" : 1,
        "make" : 1,
        "h" : {
            "$hour" : "$saleDate"
        },
        "m" : {
            "$minute" : "$saleDate"
        },
        "s" : {
            "$second" : "$saleDate"
        },
        "ml" : {
            "$millisecond" : "$saleDate"
        }
    }
};

proj2={"$project" : {
        "_id" : 0,
        "make" : 1,
        "saleDate" : {
            "$subtract" : [
                "$saleDate",
                {
                    "$add" : [
                        "$ml",
                        {
                            "$multiply" : [
                                "$s",
                                1000
                            ]
                        },
                        {
                            "$multiply" : [
                                "$m",
                                60,
                                1000
                            ]
                        },
                        {
                            "$multiply" : [
                                "$h",
                                60,
                                60,
                                1000
                            ]
                        }
                    ]
                }
            ]
        }
    }
};

group={"$group" : {
        "_id" : {
            "m" : "$make",
            "d" : "$saleDate"
        },
        "count" : {
            "$sum" : 1
        }
    }
};

Now running the aggregation gives you:

db.inventory.aggregate(match, proj1, proj2, group)
{
    "result" : [
        {
            "_id" : {
                "m" : "Toyota",
                "d" : ISODate("2013-04-10T00:00:00Z")
            },
            "count" : 4
        },
        {
            "_id" : {
                "m" : "Toyota",
                "d" : ISODate("2013-04-09T00:00:00Z")
            },
            "count" : 1
        },
        {
            "_id" : {
                "m" : "Nissan",
                "d" : ISODate("2013-04-10T00:00:00Z")
            },
            "count" : 2
        }
    ],
    "ok" : 1
}

You can add another {$project} phase to pretty up the output and you can add a {$sort} step, but basically for each date, for each make you get a count of how many were sold.

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • 1
    Asya, thank you this is exactly what I was looking for. I had struggled too with breaking out functions, so this really helps. Thank you again. – Kevin Apr 16 '13 at 12:14
  • For future readers, the purpose of the first projection is to extract the hours/minutes/second, and the second is to subtract them from the original datetimes - leaving rounded dates – ZECTBynmo Jun 11 '16 at 22:47
  • This works fine but how to get count as 0 for other dates? – Srikar Jammi Mar 28 '18 at 11:52
  • @SrikarJammi thanks for the question - I'd missed originally that part of the question and it looks like the other answers did as well. I updated my answer to include generating 0 values for dates with no sales. – Asya Kamsky Apr 08 '18 at 16:15
  • 4
    $range endValue will be over a 32 bit integer and will not work. – Augie Gardner Feb 22 '19 at 01:49
  • First of all, int32 would only overflow if your date range is over 596 days and secondly I’m not sure that even relevant since dates are represented as longs (int64) in bson. – Asya Kamsky Feb 23 '19 at 21:31
  • Note also that range isn’t from 0 to end date. It’s from 0 to delta (end date minus start date). And if you wanted to keep that small you could count them off in days rather than milliseconds and convert by multiplying in the date calculation part. – Asya Kamsky Feb 23 '19 at 21:35
  • Hi @AsyaKamsky I saw you commented on one of my post. Sorry but I didn't get time to reply on it. But now you can explain the issue with that. Please reply on that post, I will take a look. – Ashh Jul 08 '19 at 05:51
  • Really really good answer here. One of the best ones on stackoverflow. – ozn Dec 16 '19 at 06:28
  • Similar question https://stackoverflow.com/questions/60371210/mongodb-aggregate-find-duplicate-records-within-7-days/60373147#60373147 . Could you help me out on it. – Akshay Hazari Feb 24 '20 at 10:25
  • I'm struggling with this solution, too. If you are using seconds and miliseconds an int32 wont be sufficient i think. Is there another option to achieve this? – J-H Apr 01 '20 at 08:01
  • Use int64 aka long? – Asya Kamsky Apr 01 '20 at 22:54
3

I like user1083621's answer but that method causes some limitations in following operations with this field - because you can not use it as date field in (for instance) next aggregation pipeline stages. You can neither compare nor use any date aggregation operations and after aggregation you'll have strings(!). All of that may be solved by projecting your original date field but in that case you'll get some difficulties with retaining it through groupping stage. And after all, sometimes you just want to manipulate with the beginning of day, not with arbitrary day time. So here's my method:

{'$project': {
    'start_of_day': {'$subtract': [
        '$date',
        {'$add': [
            {'$multiply': [{'$hour': '$date'}, 3600000]},
            {'$multiply': [{'$minute': '$date'}, 60000]},
            {'$multiply': [{'$second': '$date'}, 1000]},
            {'$millisecond': '$date'}
        ]}
    ]},
}}

It gives you this:

{
    "start_of_day" : ISODate("2015-12-03T00:00:00.000Z")
},
{
    "start_of_day" : ISODate("2015-12-04T00:00:00.000Z")
}

Can't say if it any faster than user1083621's method.

Community
  • 1
  • 1
egvo
  • 1,493
  • 18
  • 26
  • isn't this pretty much identical to my answer (minus formatting of stages)? – Asya Kamsky Apr 08 '18 at 15:08
  • @asya-kamsky Maybe it is. Maybe I found it firstly at site you mentioned. But I'm really afraid of way it presented in the answer. That's a long answer, I found it too long to read axactly, so that is why I posted mine. If someone found my answer more helpul that was their choice because all the privileges was on your side - your answer was already top rated when I posted mine. – egvo Apr 17 '18 at 12:11
  • LOL - you thought my answer was long before - it's nearly double now that I added 3.6 (latest) way to do it. :) – Asya Kamsky Apr 17 '18 at 12:24