3

I have the following scenario where I'm struggling to understand how to apply DENSE_RANK() to get the result I want:

ID Date Value
1 1990-05-17 1.00
1 1991-10-12 1.00
1 1992-08-01 1.00
1 1993-07-05 0.67
1 1994-05-02 0.67
1 1995-02-01 1.00
1 1996-03-01 1.00

Based on the above data, I'm trying to identify distinct periods using the combination of the Date and Value columns, where a unique period is identified from where the Value column changes from one value to another. Here's the result I'm looking for:

ID Date Value Period
1 1990-05-17 1.00 1
1 1991-10-12 1.00 1
1 1992-08-01 1.00 1
1 1993-07-05 0.67 2
1 1994-05-02 0.67 2
1 1995-02-01 1.00 3
1 1996-03-01 1.00 3

As you can see, there are 3 distinct periods. The problem I am having is that when I use DENSE_RANK(), I get one of two outcomes:

SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Date, Value)

ID Date Value Period
1 1990-05-17 1.00 1
1 1991-10-12 1.00 2
1 1992-08-01 1.00 3
1 1993-07-05 0.67 4
1 1994-05-02 0.67 5
1 1995-02-01 1.00 6
1 1996-03-01 1.00 7

SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Value)

ID Date Value Period
1 1990-05-17 1.00 1
1 1991-10-12 1.00 1
1 1992-08-01 1.00 1
1 1993-07-05 0.67 2
1 1994-05-02 0.67 2
1 1995-02-01 1.00 1
1 1996-03-01 1.00 1

As you can see, the problem lies with the Date column as I need that to be a cumulative period. Furthermore, the amount of periods will vary from ID to ID and there's no consistent science behind the Date column. A member could have two entries in one year for example.

forpas
  • 160,666
  • 10
  • 38
  • 76
RoyalSwish
  • 1,503
  • 10
  • 31
  • 57

2 Answers2

5

You can use LAG() window function to get for each row its previous Value and with conditional aggregation with SUM() window function get the Periods:

SELECT ID, Date, Value,
       SUM(CASE WHEN VALUE = prev_value THEN 0 ELSE 1 END) OVER (PARTITION BY ID ORDER BY Date) Period 
FROM (
  SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) prev_value
  FROM tablename
) t
ORDER BY Date;

See the demo.

forpas
  • 160,666
  • 10
  • 38
  • 76
2

This is known as a gaps and island problem. One method is to use a couple of ROW_NUMBERs to put your data into groups:


WITH CTE AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [date],[value])-
           ROW_NUMBER() OVER (PARTITION BY ID, [value] ORDER BY [date]) AS Grp
    FROM (VALUES(1,CONVERT(date,'1990-05-17'),1.00),
                (1,CONVERT(date,'1991-10-12'),1.00),
                (1,CONVERT(date,'1992-08-01'),1.00),
                (1,CONVERT(date,'1993-07-05'),0.67),
                (1,CONVERT(date,'1994-05-02'),0.67),
                (1,CONVERT(date,'1995-02-01'),1.00),
                (1,CONVERT(date,'1996-03-01'),1.00))V(ID,Date,Value))
SELECT ID,
       Date,
       Value,
       DENSE_RANK() OVER (PARTITION BY ID ORDER BY Grp) AS Period
FROM CTE;
Thom A
  • 88,727
  • 11
  • 45
  • 75