7

I have a table that looks like the following. What I want is the the rows in continuation of each other to be grouped together - for each "ID". The column IsContinued marks if the next row should be combined with the current row

My data looks like this:

+-----+--------+-------------+-----------+----------+
| ID  | Period | IsContinued | StartDate | EndDate  |
+-----+--------+-------------+-----------+----------+
| 123 | 1      | 1           | 20180101  | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2      | 1           | 20180501  | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3      | 0           | 20181001  | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4      | 1           | 20190105  | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5      | 0           | 20190401  | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2      | 1           | 20180201  | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3      | 0           | 20180301  | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4      | 0           | 20180501  | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5      | 0           | 20180701  | 20180705 |
+-----+--------+-------------+-----------+----------+

The end result I want is this:

+-----+-------------+-----------+-----------+----------+
| ID  | PeriodStart | PeriodEnd | StartDate | EndDate  |
+-----+-------------+-----------+-----------+----------+
| 123 | 1           | 3         | 20180101  | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4           | 5         | 20190105  | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2           | 3         | 20180201  | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4           | 4         | 20180501  | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5           | 5         | 20180701  | 20180705 |
+-----+-------------+-----------+-----------+----------+

DDL Statement:

CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
                      (123,2,1,'20180501', '20180910'),
                      (123,3,0,'20181001', '20181201'),
                      (123,4,1,'20190105', '20190228'),
                      (123,5,0,'20190401', '20190430'),
                      (456,2,1,'20180201', '20180215'),
                      (456,3,0,'20180301', '20180401'),
                      (456,4,0,'20180501', '20180530'),
                      (456,5,0,'20180701', '20180705')

The code should be run on SQL Server 2016

Thanks!

Aaron Dietz
  • 10,137
  • 1
  • 14
  • 26
Tiawy
  • 175
  • 5
  • 11
  • I'm guessing the order of the rows is determined by the `Period` column? Also, sample data is best served as [DDL](https://en.wikipedia.org/wiki/Data_definition_language) + [DML](https://en.wikipedia.org/wiki/Data_manipulation_language). Please [edit] your question to include it, your current attempt and your desired results. For more details, [read this.](https://dba.meta.stackexchange.com/questions/2976/help-me-write-this-query-in-sql) – Zohar Peled Sep 06 '18 at 12:25
  • Way to go on the sample data! Now all you need to [edit] into your question is your attempt to solve the problem yourself... For more information, read [ask]. – Zohar Peled Sep 06 '18 at 13:36

1 Answers1

4

Here is one approach:

with removeFluff as
(
SELECT *
FROM (
        SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
        FROM #Period
     ) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)    
,getValues as
(
SELECT ID,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
       PeriodNr PeriodEnd,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
       EndDate,
       IsContinued
FROM removeFluff r
)

SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0

Output:

ID  PeriodStart PeriodEnd   StartDate   EndDate
123    1           3        2018-01-01  2018-12-01
123    4           5        2019-01-05  2019-04-30
456    2           3        2018-02-01  2018-04-01
456    4           4        2018-05-01  2018-05-30
456    5           5        2018-07-01  2018-07-05

Method:

  • removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
  • Now that the fluff is removed, we know that either:
  • A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
  • B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
  • We apply these two cases in the CASE expression of the getValues cte
  • Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows
Aaron Dietz
  • 10,137
  • 1
  • 14
  • 26