0

I have many documents like this:

{_id: ObjectId("5adc864eaaf408a2b6e325f7"), employee: ObjectId("5adc864eaa3c92b3c4c252c1"), end: { day: "2018-04-22 12:06:46.623" }, start: { day: "2018-04-22 11:06:46.623" }, date: "2018-04-22 11:06:46.623"}
{_id: ObjectId("5adc864eaaf408a2b6e325c8"),employee: ObjectId("5adc864eaa3c92b3c4c252c1"), end: { day: "2018-04-22 10:06:46.623" }, start: { day: "2018-04-22 8:06:46.623" }, date: "2018-04-22 11:06:46.623"}
{_id: ObjectId("5adc864eaaf408a2b6e325f6"),employee: ObjectId("5adc864eaa3c92b3c4c252c1"), end: { day: "2018-05-22 12:06:46.623" }, start: { day: "2018-04-22 11:06:46.623" }, date: "2018-05-22 11:06:46.623"}
{_id: ObjectId("5adc864eaaf408a2b6e325c4"),employee: ObjectId("5adc864eaa3c92b3c4c252c1"), end: { day: "2018-05-22 10:06:46.623" }, start: { day: "2018-05-22 8:06:46.623" }, date: "2018-05-22 11:06:46.623"}

this represents each employee's activities during the day.

I need to calculate the amount of hours worked per day, taking the number of hours of each activity between the start date "start.day" and end date "end.day" of each activity and sum all the activities of one day.

I try with some aggregates like redact, sum, substrac but I can't figured what logic I need to do to achieve this.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
cmarrero01
  • 691
  • 7
  • 21
  • You actually have an item there which spans more than one day. Is that just a result of your "copy/paste" in the question ( because there's also duplicate `_id` values ) or is that an indication of your real data? Which should be the "grouping day" anyway? Is it from the range of start to end, or are those just for the time difference and the "day" is in fact the "date" field? – Neil Lunn Mar 18 '19 at 00:28
  • To be clear, what you asked shows in order of documents, `0` then `3` then `720` and then `3` as the differences in hours between each set. So it's unclear if the `720` is a mistake in the question or if you really mean `24` hours on every day in between the two dates. – Neil Lunn Mar 18 '19 at 00:33
  • I edit the example, that was only an example to make the question here, I need the total hours that a employee worked per day and I have a lot of documents with many activities per day with an start and end date. – cmarrero01 Mar 18 '19 at 00:39
  • So every document would only have a difference in hours in a single day, and never span more than one day? Is that correct? Also what are you adding? Each employee per day? – Neil Lunn Mar 18 '19 at 00:51
  • I need the hours per day, but they have several activities per day, so, I need to filter that and get the amount of hours worked per day with a bunch of documents per day, I assume that I can group by employee and date, and take the difference between start and end date of each activity and sum the total of all activities per day. – cmarrero01 Mar 18 '19 at 02:08
  • Some activities can start, for example, monday at 5 pm, and finish thuesday 3 pm, but that is an exemption, the common rules is that all activities start and end the same day. – cmarrero01 Mar 18 '19 at 02:10

1 Answers1

1

So the first thing really to cover is your current "dates" are all "strings", and this really does not help. You would be better served converting everything to BSON Date, since that is basically what is required in the aggregation operations anyway.

Second point is that getting the totals for "each day" within an interval is not easy. In fact you really need to throw some expressions at MongoDB in order to do such a thing:

db.collection.aggregate([
  { "$addFields": {
    "start": { "$toDate": "$start.day" },
    "end": { "$toDate": "$end.day" },
    "date": { "$toDate": "$date" },
    "dayworking": {
      "$map": {
        "input": {
          "$range": [
            0,
            { "$ceil": {
              "$divide": [
                { "$subtract": [
                  { "$toDate": "$end.day" },
                  { "$toDate": "$start.day" }
                ]},
                1000 * 60 * 60 * 24
              ]
            }}
          ]
        },
        "in": {
          "$toDate": {
            "$add": [
              { "$multiply": ["$$this", 1000 * 60 * 60 * 24 ] },
              { "$subtract": [
                { "$toLong": { "$toDate": "$start.day" } },
                { "$mod": [ { "$toLong": { "$toDate": "$start.day" } }, 1000 * 60 * 60 * 24 ] }
              ]}
            ]
          }
        }
      }
    }
  }},
  { "$unwind": "$dayworking" },
  { "$group": {
    "_id": {
      "employee": "$employee",
      "day": "$dayworking"
    },
    "hours": {
      "$sum": {
        "$floor": {
          "$divide": [
            { "$switch": {
              "branches": [
                { 
                  "case": {
                    "$and": [
                      { "$lt": [ "$dayworking", "$start" ] },
                      { "$gt": [
                        { "$add": [ "$dayworking", 1000 * 60 * 60 * 24 ] },
                        "$end"
                      ]}
                    ]
                  },
                  "then": { "$subtract": [ "$end", "$start" ] }
                },
                {
                  "case": {
                    "$lt": [
                      "$end",
                      { "$add": [ "$dayworking", 1000 * 60 * 60 * 24 ] }
                    ]
                  },
                  "then": {
                    "$subtract": [ "$end", "$dayworking" ]
                  }
                },
                {
                  "case": { "$lt": [ "$dayworking", "$start" ] },
                  "then": {
                    "$subtract": [
                      { "$add": [ "$dayworking", 1000 * 60 * 60 * 24 ] },
                      "$start"
                    ]
                  }
                }
              ],
              "default": 1000 * 60 * 60 * 24
            }},
            1000 * 60 * 60
          ]
        }
      }
    }
  }},
  { "$sort": { "_id": 1 } }
])

Basically returns every day within the start and end intervals as (truncating for brevity):

{
        "_id" : {
                "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
                "day" : ISODate("2018-04-22T00:00:00Z")
        },
        "hours" : 15
}
{
        "_id" : {
                "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
                "day" : ISODate("2018-04-23T00:00:00Z")
        },
        "hours" : 24
}

.... each day in between ...

{
        "_id" : {
                "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
                "day" : ISODate("2018-05-21T00:00:00Z")
        },
        "hours" : 24
}
{
        "_id" : {
                "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
                "day" : ISODate("2018-05-22T00:00:00Z")
        },
        "hours" : 14
}

Which is the allocated 24 hours on "whole days" and the partial hours on others. As from your example the first day has data that generates as:

{
        "_id" : ObjectId("5adc864eaaf408a2b6e325f7"),
        "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
        "end" : ISODate("2018-04-22T12:06:46.623Z"),
        "start" : ISODate("2018-04-22T11:06:46.623Z"),
        "date" : ISODate("2018-04-22T11:06:46.623Z"),
        "dayending" : ISODate("2018-04-22T00:00:00Z"),
        "hours" : 1
}
{
        "_id" : ObjectId("5adc864eaaf408a2b6e325c8"),
        "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
        "end" : ISODate("2018-04-22T10:06:46.623Z"),
        "start" : ISODate("2018-04-22T08:06:46.623Z"),
        "date" : ISODate("2018-04-22T11:06:46.623Z"),
        "dayending" : ISODate("2018-04-22T00:00:00Z"),
        "hours" : 2
}
{
        "_id" : ObjectId("5adc864eaaf408a2b6e325f6"),
        "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
        "end" : ISODate("2018-05-22T12:06:46.623Z"),
        "start" : ISODate("2018-04-22T11:06:46.623Z"),
        "date" : ISODate("2018-05-22T11:06:46.623Z"),
        "dayending" : ISODate("2018-04-22T00:00:00Z"),
        "hours" : 12
}

Being the two sole entries and one with a 12 hour remainder which makes up the 15 hours and the last day:

{
        "_id" : ObjectId("5adc864eaaf408a2b6e325f6"),
        "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
        "end" : ISODate("2018-05-22T12:06:46.623Z"),
        "start" : ISODate("2018-04-22T11:06:46.623Z"),
        "date" : ISODate("2018-05-22T11:06:46.623Z"),
        "dayending" : ISODate("2018-05-22T00:00:00Z"),
        "hours" : 12
}
{
        "_id" : ObjectId("5adc864eaaf408a2b6e325c4"),
        "employee" : ObjectId("5adc864eaa3c92b3c4c252c1"),
        "end" : ISODate("2018-05-22T10:06:46.623Z"),
        "start" : ISODate("2018-05-22T08:06:46.623Z"),
        "date" : ISODate("2018-05-22T11:06:46.623Z"),
        "dayending" : ISODate("2018-05-22T00:00:00Z"),
        "hours" : 2
}

Has a 2 hour entry and another 12 hour remainder making 14 in total.

Explanation

Date Conversions and Math

To explain that there are essentially two main things that need to be done outside of the obvious "date conversion". That by the way can be done with $toDate from MongoDB 4.0 or via $dateFromString if you have MongoDB 3.6. Noting that in the latter case you also need to apply different methods for the "date math"

There are detailed examples of handling "date math" in earlier MongoDB versions on Group result by 15 minutes time interval in MongoDb in the cases where you have an earlier version and either $dateFromString or straight conversion of your data is required first.

Projecting the Dates in the Range

The next main part to make this work is you basically need to construct an array of the dates which a document applies to inside the source document. This is what the $range expression does, by taking a starting value ( 0 in this case ) and an ending value which here we apply to being the "number of days between" the start and end date values.

That difference is returned from a $subtract in milliseconds, so a $divide is used over the constant milliseconds in a day to get a whole integer. Using $ceil here to round, but this could just as easily be $mod with $subtract where that operator is not available in earlier versions.

At that point the $range has really just produced an array of integer values, so the $map is applied to that array in order to convert these to the actual BSON Date objects that should represent the "day" that data applies for. Again it's just some "date math" applying an addition of the array index value ( +1 of course ) to the original rounded starting date.

Calculating the hours

Now with an array of dates from the earlier stage and some other reformatting of document values into usable BSON Dates, you need to actually compare this "array" content with each start and end value to determine how many hours applied in that day.

The first basic case and why we actually created an array for this is to use $unwind, which effectively copies the result document for every day occurring within the interval. It's a small but important step that must happen before you $group and actually count things. Bottom line is the $group will actually use those values as part of it's "primary key" for output, and also comparison to other date information.

Of course the real work here is all being done in the $switch statement, which again can just as well be a "nested" use of $cond in earlier versions. Here you basically want to analyse the three possible cases and of course the default fallback for the "full day".

The cases basically are:

  • Where the current "grouping day" is less than the start AND the "following day" would be greater than the end date, simply subtract the difference.

  • If not the above, then when the end date is less than the "following day" for grouping then subtract that "grouping day" from the current end date to get hours of that start of day until the end time.

  • If not the above, then when the "grouping day" is less than the start ( without the other end condition from earlier ) then the hours worked would be from the "following day" subtracting the start for the difference from start to the end of the current day.

  • If those were not true then default to showing a "whole day", which is being shown as 24 hours in this example.

If you had other working hours to apply, then it's just a matter of adjusting that in, i.e "start of day" +8 hours for an 8 a.m start. The same thing basically applies to "end of day" by adding something like +17 for a 5.pm finish. But the basic principles of the logic to implement are still the same as shown above.

NOTE: The main constraint here is $range, which I believe was from MongoDB 3.0 or possibly 3.2. At any rate you probably really should not be running any version of MongoDB prior to 3.4 at this present time.

If you did have an earlier version, then there are some more details on Group and count over a start and end range with another earlier answer of mine showing a similar process using multiple queries and even mapReduce() in addition to a very similar $range example.

Community
  • 1
  • 1
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317