-1

I am working on one of requirement the raw data is in following format

Requirement - Startdate should be the date when status changed to 1 and enddate should be the 1st date after the record status changed from 1 to any other number.

Customer Status Date
A123 0 7/2/2021
A123 0 7/15/2021
A123 0 7/22/2021
A123 1 8/18/2021
A123 1 9/8/2021
A123 0 12/1/2021
A123 0 1/21/2022
A123 1 3/6/2022
A123 1 3/7/2022
A123 0 3/15/2022
B123 1 1/1/2022
B123 0 1/6/2022
C123 1 1/2/2022
C123 2 1/8/2022
C123 0 1/9/2022

expected output

Customer StartDate EndDate
A123 8/18/2021 12/1/2021
A123 9/8/2021 12/1/2021
A123 3/6/2022 3/15/2022
A123 3/7/2022 3/15/2022
B123 1/1/2022 1/6/2022
C123 1/2/2022 1/8/2022

Query I tried to get the output is below, I am getting the output for Customer B123 and C123, but not for A123 as expected.

Query Explanation - In 1st part of query I am taking all the records with status = 1 and in next part taking only those records where status is not equal to 1, and joining these 2 datasets based on Customer and row number generated.

SELECT A.[Customer],A.StartDate,B.EndDate
    from
    (
         SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
         FROM table1
         WHERE [STATUS] = 1
         GROUP BY Customer,Date,[Status]
     ) A
     LEFT JOIN
     (
        SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
         FROM table1
         WHERE [STATUS] != 1 
         AND Date>(
                    SELECT MIN(Date) AS STARTDATE
                    FROM table1
                    WHERE [STATUS] = 1
                )
     GROUP BY Customer,Date,[Status]
     ) B 
        ON 
        (
            A.[Customer] = B.[Customer] 
            AND A.RowNum = B.RowNum
        )
     ORDER BY A.Startdate
Charlieface
  • 52,284
  • 6
  • 19
  • 43
BIDeveloper
  • 767
  • 3
  • 14
  • 28
  • 3
    Images of data doesn't help us help you. Neither does no explanation of the logic. – Thom A Mar 31 '22 at 12:05
  • 1
    @Larnu updated the question with data instead of image and added the requirement and explanation – BIDeveloper Mar 31 '22 at 12:21
  • 1
    Note: this is an "Islands and Gaps" type problem that is best solved using windowing aggregate functions (usually the difference of two of them). Cross Joins and Left Joins can be made to work but are pretty inefficient (O(n^2)) compared to a differential query (O(n)). – RBarryYoung Mar 31 '22 at 12:43

2 Answers2

1

First you list the rows where Status = 1 and then use CROSS APPLY to get the corresponding minimum Date where the Status is not equal to 1

select s.[Customer],
       StartDate = s.[Date],
       EndDate   = e.[Date]
from   Table1 s
       cross apply
       (
           select [Date] = min(e.[Date])
           from   Table1 e
           where  e.[Customer] = s.[Customer]
           and    e.[Date]     > s.[Date]
           and    e.[Status]   <> 1
       ) e
where  s.[Status] = 1 
order by s.[Customer], s.[Date]
Squirrel
  • 23,507
  • 4
  • 34
  • 32
1

Here is a more efficient way to do this without a self-join.

WITH cte01only AS
  ( SELECT *, CASE Status WHEN 1 THEN 1 ELSE 0 END AS Status1 FROM table1 ),
  cteDifference AS
  (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY Customer  ORDER BY Date, Status1)
        - ROW_NUMBER() OVER (PARTITION BY Customer, Status1  ORDER BY Date) AS StatusGroup
    FROM cte01only
  ),
  cteGroup AS
  (
    SELECT   Customer, StatusGroup, Status1, MIN(Date) As StartDate
    FROM cteDifference
    GROUP BY Customer, StatusGroup, Status1
  ),
  cteNextDate AS
  (
    SELECT Customer, StatusGroup, Status1, StartDate,
        LEAD(StartDate, 1, NULL) OVER (PARTITION BY Customer  ORDER BY StatusGroup) AS EndDate
    FROM cteGroup
  )
SELECT Customer, StartDate, EndDate
FROM cteNextDate
WHERE Status1 = 1
ORDER BY Customer, StateDate

The key trick here is the second CTE which uses the difference of two ROW_NUMBER() functions to tag the customer records (with the StatusGroup column) into separate partitions by contiguous runs of records whose status is 1 or not 1. After that they can be grouped according to that tag to get the start dates, and then use the LEAD() function to get the following group's StartDate as the current groupings EndDate.

(There may be a more compact way to express this, but I like to layout each stage as a separate CTE.)

RBarryYoung
  • 55,398
  • 14
  • 96
  • 137