I'm looking to pivot a transactional data set into SCD2 in such a way that captures the intervals in which each combination at the pivot grain were effective.
Snowflake is the actual DBMS I'm using, but tagging Oracle too because their dialects are nearly identical. I could probably finagle a solution provided for any DBMS though.
I have working sql, but this it's born out of trial&error and I feel like there has to be a more elegant way that I'm missing because it's very ugly and computationally expensive.
(Note: The second record in the input data "expires" the first. It can be assumed that every day of interest will occur as an add_dts at least once.) (added as image at the end until I can figure out why the markup isn't working)
Input:
Original_Grain | Pivot_Grain | Pivot_Column | Pivot_Attribute | ADD_TS |
---|---|---|---|---|
OG-1 | PG-1 | First_Col | A | 2020-01-01 |
OG-1 | PG-1 | First_Col | B | 2020-01-02 |
OG-2 | PG-1 | Second_Col | A | 2020-01-01 |
OG-3 | PG-1 | Third_Col | C | 2020-01-02 |
OG-3 | PG-1 | Third_Col | B | 2020-01-03 |
Output:
Pivot_Grain | First_Col | Second_Col | Third_Col | From_Dt | To_Dt |
---|---|---|---|---|---|
PG-1 | A | A | NULL | 2020-01-01 | 2020-01-02 |
PG-1 | B | A | C | 2020-01-02 | 2020-01-03 |
PG-1 | B | A | B | 2020-01-03 | 9999-01-01 |
WITH INPUT AS
( SELECT 'OG-1' AS Original_Grain,
'PG-1' AS Pivot_Grain,
'First_Col' AS Pivot_Column,
'A' AS Pivot_Attribute,
TO_DATE('2020-01-01','YYYY-MM-DD') AS Add_Dts
FROM dual
UNION
SELECT 'OG-1' AS Original_Grain,
'PG-1' AS Pivot_Grain,
'First_Col' AS Pivot_Column,
'B' AS Pivot_Attribute,
TO_DATE('2020-01-02','YYYY-MM-DD')
FROM dual
UNION
SELECT 'OG-2' AS Original_Grain,
'PG-1' AS Pivot_Grain,
'Second_Col' AS Pivot_Column,
'A' AS Pivot_Attribute,
TO_DATE('2020-01-01','YYYY-MM-DD')
FROM dual
UNION
SELECT 'OG-3' AS Original_Grain,
'PG-1' AS Pivot_Grain,
'Third_Col' AS Pivot_Column,
'C' AS Pivot_Attribute,
TO_DATE('2020-01-02','YYYY-MM-DD')
FROM dual
UNION
SELECT 'OG-3' AS Original_Grain,
'PG-1' AS Pivot_Grain,
'Third_Col' AS Pivot_Column,
'B' AS Pivot_Attribute,
TO_DATE('2020-01-03','YYYY-MM-DD')
FROM dual
),
GET_NORMALIZED_RANGES AS
( SELECT I.*,
COALESCE(
LEAD(Add_Dts) OVER (
PARTITION BY I.Original_Grain
ORDER BY I.Add_Dts), TO_DATE('9000-01-01')
) AS Next_Add_Dts
FROM INPUT I
),
GET_DISTINCT_ADD_DATES AS
( SELECT DISTINCT Add_Dts AS Driving_Date
FROM Input
),
NORMALIZED_EFFECTIVE_AT_EACH_POINT AS
( SELECT GNR.*,
GDAD.Driving_Date
FROM GET_NORMALIZED_RANGES GNR
INNER
JOIN GET_DISTINCT_ADD_DATES GDAD
ON GDAD.driving_date >= GNR.add_dts
AND GDAD.driving_Date < GNR.next_add_dts
),
PIVOT_EACH_POINT AS
( SELECT DISTINCT
Pivot_Grain,
Driving_Date,
MAX("'First_Col'") OVER ( PARTITION BY Pivot_Grain, Driving_Date) AS First_Col,
MAX("'Second_Col'") OVER ( PARTITION BY Pivot_Grain, Driving_Date) AS Second_Col,
MAX("'Third_Col'") OVER ( PARTITION BY Pivot_Grain, Driving_Date) AS Third_Col
FROM NORMALIZED_EFFECTIVE_AT_EACH_POINT NEP
PIVOT (MAX(Pivot_Attribute) FOR PIVOT_COLUMN IN ('First_Col','Second_Col','Third_Col'))
)
SELECT Pivot_Grain,
Driving_Date AS From_Dt,
COALESCE(LEAD(Driving_Date) OVER ( PARTITION BY pivot_grain ORDER BY Driving_Date),TO_DATE('9999-01-01')) AS To_Dt,
First_Col,
Second_Col,
Third_Col
FROM PIVOT_EACH_POINT