-1

I would like to do the following transformation in SQL Server database.

Aggregate over Tag_of_thing & Status_of_thing such that...

Note in this case end shift is the row above, see example tables

  1. If Start_shift_date_time = End_shift_date_time then aggregate rows such that start = min(Start_shift_date_time ) and finish = max(End_shift_date_time )

  2. And if Start_shift_date_time <= End_shift_date_time + 10mins then aggregate rows such that start = min(Start_shift_date_time ) and finish = max(End_shift_date_time )

Raw Source Data Table (lets call this table input)

Tag_of_thing    Status_of_thing Start_shift_date_time   End_shift_date_time Other_column_I_dont_care_about
A123    T123    04/08/2020 15:07:02 04/08/2020 18:00:00 56110
A123    T123    04/08/2020 18:00:00 05/08/2020 01:27:41 32473
A123    T123    05/08/2020 06:15:41 05/08/2020 06:00:00 26808
A123    T124    05/08/2020 06:00:00 05/08/2020 18:00:00 23969
A123    T124    05/08/2020 18:00:00 06/08/2020 06:00:00 29613
A123    T124    06/08/2020 06:00:00 06/08/2020 10:48:00 7276
B124    G1  03/08/2020 12:43:02 03/08/2020 18:00:00 29806
B124    G1  03/08/2020 18:00:00 03/08/2020 23:03:41 46101
B124    G1  03/08/2020 23:07:41 04/08/2020 03:55:41 15510
C124    R1  03/08/2020 12:43:02 03/08/2020 18:00:00 47527
C124    R1  03/08/2020 18:00:00 03/08/2020 23:03:41 16708
C124    R1  03/08/2020 23:18:41 04/08/2020 04:06:41 3247

enter image description here

Transformed Output Data Table

Tag_of_thing    Status_of_thing Start_shift_date_time   End_shift_date_time
A123    T123    04/08/2020 15:07:02 05/08/2020 01:27:41
A123    T123    05/08/2020 06:15:41 05/08/2020 06:00:00
A123    T124    05/08/2020 06:00:00 06/08/2020 10:48:00
B124    G1  03/08/2020 12:43:02 04/08/2020 03:55:41
C124    R1  03/08/2020 12:43:02 03/08/2020 23:03:41
C124    R1  03/08/2020 23:18:41 04/08/2020 04:06:41

enter image description here

Please let me know if you need more examples or a different explanation of the transformation process

Resources

Similar Gaps & Island Problem: https://bertwagner.com/2019/03/12/gaps-and-islands/

Shella
  • 89
  • 1
  • 7
  • Its awesome that you would like that. So what is stopping you from getting it? What have you tried? What have you researched? Where did you get stuck? Don't expect us to write your entire query for you. Do the work, have a go, and post a question when stuck. – Dale K Aug 14 '20 at 07:26
  • 1
    Also please use formatted text instead of images. – Dale K Aug 14 '20 at 07:27
  • This question is not worthy of downvotes (in my opinion) and only serves to be non-welcoming to a new user. – Gordon Linoff Aug 14 '20 at 12:05
  • Thank you for the comments, I originally attempted this using a script with pandas data frames however when implementing it was apparent this had to be conducted at the database. I had not come across terminology used by Gordon Linofff (gap-and-islands) which would have helped find an existing answer. Appreciates your comments and I'll amend the images to text and validate the answer. – Shella Aug 14 '20 at 13:49

1 Answers1

1

If I understand correctly, this is a gap-and-islands problem -- with the twist that you are allowing up to 10 minutes between rows for aggregation.

select tag_of_thing, status_of_thing, grp, min(start_shift_date_time), max(end_shift_date_time) 
from (select t.*,
             sum(case when prev_esdt > dateadd(minute, -10, start_shift_date_time)
                      then 0 else 1
                 end)  over (partition by tag_of_thing, status_of_thing order by start_shift_date_time) as grp
      from (select t.*,
                   lag(end_shift_date_time) over (partition by tag_of_thing, status_of_thing order by start_shift_date_time) as prev_esdt
            from t
           ) t
     ) t
group by tag_of_thing, status_of_thing, grp
order by tag_of_thing, status_of_thing, min(start_shift_date_time);

This identifies where the first row for a new group is by comparing the previous end time to the current start time. The group is then a cumulative sum of these values, with the final step being aggregation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786