DENSE_RANK on two columns, where one column are distinct values and the other has duplicate

Question

I have the following scenario where I'm struggling to understand how to apply DENSE_RANK() to get the result I want:

ID	Date	Value
1	1990-05-17	1.00
1	1991-10-12	1.00
1	1992-08-01	1.00
1	1993-07-05	0.67
1	1994-05-02	0.67
1	1995-02-01	1.00
1	1996-03-01	1.00

Based on the above data, I'm trying to identify distinct periods using the combination of the Date and Value columns, where a unique period is identified from where the Value column changes from one value to another. Here's the result I'm looking for:

ID	Date	Value	Period
1	1990-05-17	1.00	1
1	1991-10-12	1.00	1
1	1992-08-01	1.00	1
1	1993-07-05	0.67	2
1	1994-05-02	0.67	2
1	1995-02-01	1.00	3
1	1996-03-01	1.00	3

As you can see, there are 3 distinct periods. The problem I am having is that when I use DENSE_RANK(), I get one of two outcomes:

SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Date, Value)

ID	Date	Value	Period
1	1990-05-17	1.00	1
1	1991-10-12	1.00	2
1	1992-08-01	1.00	3
1	1993-07-05	0.67	4
1	1994-05-02	0.67	5
1	1995-02-01	1.00	6
1	1996-03-01	1.00	7

SELECT DENSE_RANK() OVER (PARTITION BY ID ORDER BY Value)

ID	Date	Value	Period
1	1990-05-17	1.00	1
1	1991-10-12	1.00	1
1	1992-08-01	1.00	1
1	1993-07-05	0.67	2
1	1994-05-02	0.67	2
1	1995-02-01	1.00	1
1	1996-03-01	1.00	1

As you can see, the problem lies with the Date column as I need that to be a cumulative period. Furthermore, the amount of periods will vary from ID to ID and there's no consistent science behind the Date column. A member could have two entries in one year for example.

score 5 · Accepted Answer · answered Aug 23 '21 at 13:45

You can use LAG() window function to get for each row its previous Value and with conditional aggregation with SUM() window function get the Periods:

SELECT ID, Date, Value,
       SUM(CASE WHEN VALUE = prev_value THEN 0 ELSE 1 END) OVER (PARTITION BY ID ORDER BY Date) Period 
FROM (
  SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) prev_value
  FROM tablename
) t
ORDER BY Date;

See the demo.

score 2 · Answer 2 · answered Aug 23 '21 at 13:41

This is known as a gaps and island problem. One method is to use a couple of ROW_NUMBERs to put your data into groups:


WITH CTE AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [date],[value])-
           ROW_NUMBER() OVER (PARTITION BY ID, [value] ORDER BY [date]) AS Grp
    FROM (VALUES(1,CONVERT(date,'1990-05-17'),1.00),
                (1,CONVERT(date,'1991-10-12'),1.00),
                (1,CONVERT(date,'1992-08-01'),1.00),
                (1,CONVERT(date,'1993-07-05'),0.67),
                (1,CONVERT(date,'1994-05-02'),0.67),
                (1,CONVERT(date,'1995-02-01'),1.00),
                (1,CONVERT(date,'1996-03-01'),1.00))V(ID,Date,Value))
SELECT ID,
       Date,
       Value,
       DENSE_RANK() OVER (PARTITION BY ID ORDER BY Grp) AS Period
FROM CTE;

DENSE_RANK on two columns, where one column are distinct values and the other has duplicate

2 Answers2