I've a dataset as under:
+----+-------+---------------------+
| ID | SUBID | date |
+----+-------+---------------------+
| A | 1 | 2021-01-01 12:00:00 |
| A | 1 | 2021-01-02 01:00:00 |
| A | 1 | 2021-01-02 02:00:00 |
| A | 1 | 2021-01-03 03:00:00 |
| A | 2 | 2021-01-05 16:00:00 |
| A | 2 | 2021-01-06 13:00:00 |
| A | 2 | 2021-01-07 06:00:00 |
| A | 2 | 2021-01-08 08:00:00 |
| A | 2 | 2021-01-08 10:00:00 |
| A | 2 | 2021-01-08 11:00:00 |
| A | 3 | 2021-01-09 09:00:00 |
| A | 3 | 2021-01-10 19:00:00 |
| A | 3 | 2021-01-11 20:00:00 |
| A | 3 | 2021-01-12 22:00:00 |
| B | 1 | 2021-02-01 23:00:00 |
| B | 1 | 2021-02-02 15:00:00 |
| B | 1 | 2021-02-03 06:00:00 |
| B | 1 | 2021-02-04 08:00:00 |
| B | 2 | 2021-02-05 18:00:00 |
| B | 2 | 2021-02-05 19:00:00 |
| B | 2 | 2021-02-06 22:00:00 |
| B | 2 | 2021-02-06 23:00:00 |
| B | 2 | 2021-02-07 04:00:00 |
| B | 2 | 2021-02-08 02:00:00 |
| B | 3 | 2021-02-09 01:00:00 |
| B | 3 | 2021-02-10 03:00:00 |
| B | 3 | 2021-02-11 13:00:00 |
| B | 3 | 2021-02-12 14:00:00 |
+----+-------+---------------------+
I want to be able to get the time difference between each ID and SUBID group in hours, preferably in terms of business hours, where each of the date that appears on a weekend or a federal holiday can be moved to a nearest weekday (preceding or succeeding) with a time of 23:59:59 as under:
+----+-------+---------------------+------------------------------------------------------------------+
| ID | SUBID | date | timediff (hours) with preceding date for each group (ID, SUBID) |
+----+-------+---------------------+------------------------------------------------------------------+
| A | 1 | 2021-01-01 12:00:00 | 0 |
| A | 1 | 2021-01-02 01:00:00 | 13 |
| A | 1 | 2021-01-02 02:00:00 | 1 |
| A | 1 | 2021-01-03 03:00:00 | 1 |
| A | 2 | 2021-01-05 16:00:00 | 0 |
| A | 2 | 2021-01-06 13:00:00 | 21 |
| A | 2 | 2021-01-07 06:00:00 | 17 |
| A | 2 | 2021-01-08 08:00:00 | 2 |
| A | 2 | 2021-01-08 10:00:00 | 2 |
| A | 2 | 2021-01-08 11:00:00 | 1 |
| A | 3 | 2021-01-09 09:00:00 | 0 |
| A | 3 | 2021-01-10 19:00:00 | 36 |
| A | 3 | 2021-01-11 20:00:00 | 1 |
| A | 3 | 2021-01-12 22:00:00 | 1 |
| B | 1 | 2021-02-01 23:00:00 | 0 |
| B | 1 | 2021-02-02 15:00:00 | 16 |
| B | 1 | 2021-02-03 06:00:00 | 15 |
| B | 1 | 2021-02-04 08:00:00 | 26 |
| B | 2 | 2021-02-05 18:00:00 | 0 |
| B | 2 | 2021-02-05 19:00:00 | 1 |
| B | 2 | 2021-02-06 22:00:00 | 27 |
| B | 2 | 2021-02-06 23:00:00 | 1 |
| B | 2 | 2021-02-07 04:00:00 | 5 |
| B | 2 | 2021-02-08 02:00:00 | 22 |
| B | 3 | 2021-02-09 01:00:00 | 0 |
| B | 3 | 2021-02-10 03:00:00 | 26 |
| B | 3 | 2021-02-11 13:00:00 | 11 |
| B | 3 | 2021-02-12 14:00:00 | 1 |
+----+-------+---------------------+------------------------------------------------------------------+
and lastly I want to calculate the average time which would be the sum of time differences per group (ID, SUBID) divide by the total count per group as under:
+----+-------+------------------------------------------------------------+
| ID | SUBID | Average time (count per group/ total time diff of group ) |
+----+-------+------------------------------------------------------------+
| A | 1 | 15/4 |
| A | 2 | 43/6 |
| A | 3 | 38/4 |
| B | 1 | 57/4 |
| B | 2 | 56/6 |
| B | 3 | 38/4 |
+----+-------+------------------------------------------------------------+
I'm fairly new to R and I came across lubridate to help me format the dates and I wasable to get the time diff using the code below
df%>%
group_by(ID, SUBID) %>%
mutate(time_diff = difftime(date, lag(date), unit = 'min'))
However I'm having troubles getting difference of just the business days time and also getting the average time as per the last table