0

although there are similar questions here, how to find / count documents in a specific week range, I was unable to find out, how to match / count documents from last X weeks. Consider this query:

db.getCollection('post').aggregate([

   {
        $match: {
            createdDate: {'$gte': new Date(new Date() - 7 * 60 * 60 * 24 * 1000)},
        }
    },
    {
        $group: {
            _id: {$week: '$createdDate'},
            count: {$sum: 1}
        }
    }
])

This query counts documents from the last 7 days and groups the count by the week number - but the result actually contain two fileds: the number of of documents from the current week and the number of documents from last week starting from the day which is current day - 7 days.

Example - query starting on Wednesday:

  • Last 7 days previous week: [Thu, Fri, Sat, Sun] | current week: [ Mon, Tue, Wed]
  • last week previous week: [Mon, Tue, Wed, Thu, Fri, Sat, Sun] | current week: [ ]

The query should return right result not matter if started on first day of a week or last day of a week. Also year change has to be taken in account - last week of December has week number 52, first week of January week number 1.

Other potential issues may result from weeks without any related documents. You just can't group documents from the last 21 Days by the week number and use $sort and $limit, because weeks without any related document wouldn't appear in the grouping.

karlitos
  • 1,604
  • 3
  • 27
  • 59
  • can you share more info like expected result you want and current result you're getting? – harshit kohli May 07 '20 at 20:44
  • Lets say In a week 10 - which is the week number of the current date - I would like to know number of documents created in the week with the week number 9. The current approach matches all documents from the last 7 days, which covers part of the week 10 and part of the week 9 – karlitos May 07 '20 at 20:49
  • okay got you point i have one approach i'm posting the answer – harshit kohli May 07 '20 at 21:01

2 Answers2

1

A spend some serious time trying to find an own solution based around the $week operator and decided to post it here, but in the end I realised, that the approach suggested by harshit kohli is better. I will explain later. Upon further investigation it came to my attention, that the first answer have some major flaws resulting from not taking weeks without any related documents in consideration.

UPDATE:

I finally found a solution based on week numbers which also should give right results even when the query spreads over the year change. The key is to subtract 52 (yearly week number)

db.getCollection('post').aggregate([
{
    {
       $match: { // match posts at least 2 Weeks + 6 days old
          createdDate: {'$gte':new Date(new Date().setDate(new Date().getDate() - 3 * 7 - 1))},
       }
    },
    {
       $addFields: {
          createdInWeek: { 
             $cond: {
                 if: {
                     $eq: [
                        { $year: "$createdDate"},
                        { $year: new Date()}
                     ]
                 }, then: {
                     $week: "$createdDate"
                 }, else: {
                     $subtract: [ { $week: "$createdDate"}, 52]
                 }
              }
          },
          currentWeek: { $week: new Date() },
      }
  },
  {
      $match: {
          $expr: {
              $and: [
                  {$lte: ["$createdInWeek", { $subtract: ["$currentWeek", 1] } ]},
                  {$gte: ["$createdInWeek", { $subtract: ["$currentWeek", 2] } ]},
              ]
          }
      }    
  },
  {
      $group: {
          _id: {$week: '$createdDate'},
          count: {
              $sum: 1
          }
      }
  },
])

Thus, there still might be issues with years containing 53 weeks, reference here

PREVIOUS:

db.getCollection('post').aggregate([
{   // optional, can pre-sort your collection, so you don't add fields to every document in a huge collection
    $match: {
        createdDate: {'$gte':new Date(new Date().setDate(new Date().getDate() - 2 * 7 + 1))},
    }
},
{
    $addFields: {
        createdInWeek: { $week: '$createdDate' },
        currentWeek: { $week: new Date() },
    }
},
{
    $match: { // use of $expr. necessary since MongoDb 3.6 see https://stackoverflow.com/a/58567621/1991697
        $expr: {
           $and: [
              {$lte: ["$createdInWeek", { $subtract: ["$currentWeek", 1] } ]},
              {$gte: ["$createdInWeek", { $subtract: ["$currentWeek", 2] } ]},
           ]
        }
    }               
},
{
   $group: {
       _id: {$week: '$createdDate'},
       count: {
          $sum: 1
       }
    }
},    
])

This approach let Mongo calculate the week numbers of the current week and the createdDate for every document in your collection and you can do then simple filetring using further $match operator. It will also allow you to set the week range easier.

BUT

This will probably fail if your week-range will split across the break of a year. (First January week is week 0 , last December week is week 52)

karlitos
  • 1,604
  • 3
  • 27
  • 59
0

try adding two more stages first sort them on the basis of group _id i.e week number then limit result to 1

 db.getCollection('post').aggregate([

       {
            $match: {
                createdDate: {'$gte': new Date(new Date() - 15 * 60 * 60 * 24 * 1000)},
            }
        },
        {
            $group: {
                _id: {$week: '$createdDate'},
                count: {$sum: 1}
            }
        },
       {
         $sort:{
        _id:-1  
        }
       },
      {
       $skip:1
       },
       {
        $limit:1
       }
    ])
harshit kohli
  • 302
  • 2
  • 8
  • Thank you very much for your effort, but this does not solves the match issue. Lets say from Wednesday, the last seven days are: 9:[Thu, Fri, Sat, Sun] 10:[ Mon, Tue, Wed] but we are looking for 9:[Mon, Tue, Wed, Thu, Fri, Sat, Sun] 10: [] Maybe I shall post this explanantion to my original question. – karlitos May 07 '20 at 21:17
  • just for my understanding from saturday or friday of 10th week you still want 9th week data? – harshit kohli May 07 '20 at 21:25
  • From 10th week I will only 9th full week data. – karlitos May 07 '20 at 21:31
  • I've one more approach like instead of doing last 7 days calculate for 15 days and add skip 1 before limit that might solve your problem ( i've edited the proposed answer as well kindly review it) – harshit kohli May 07 '20 at 21:33
  • This actually works, but I was hoping somebody could give me some hint how to do matching-by-week-numbers. Your solution, although correct, does not scale very well, when the query should be extended to: "last two weeks"or similar – karlitos May 07 '20 at 21:48
  • you can do that dynamically as well! are you using some programming language as interface or directly querying in shell? – harshit kohli May 07 '20 at 22:04
  • First I though your solution is right, but it turned out it's not bulletproof. You don't take in consideration, that you may have weeks without any posts. There would'n be any group fir such a week - you may end with array counting for posts in a week 9 and 11, skipping the 10 and all the array operations as skip and limit would be than false – karlitos May 11 '20 at 14:39