0

I have the table below and I want to write a query that add's two columns:

Valid From = Indicating when a relationship between account & opp started

Valid To = Indicating when a relationship between account & opp ended

**NOTE THAT THE INSERT TIMESTAMP IS UPDATED IF ANY FIELD CHANGES IE COLOR NOT JUST IF A RELATIONSHIP CHANGES **

Query to create table in http://sqlfiddle.com/#!9/0bfae1/1

CREATE TABLE accounts (
  account_id VARCHAR(100),
  load_timestamp TIMESTAMP,
  opportunity_id VARCHAR(100),
  COLOR VARCHAR(100)
);

    INSERT INTO accounts VALUES
    ('1', '2023-03-18','a', 'red'),
    ('1', '2023-03-19','a', 'blue'),
    ('2', '2023-03-19','b', 'blue'),
    ('3', '2023-03-20', 'c', 'blue'),
    ('4', '2023-03-21', 'd', 'blue'),
    ('5', '2023-03-22', 'e', 'blue'),
    ('1', '2023-03-23', 'f', 'blue'),
    ('1', '2023-03-24', 'a', 'blue'),
    ('3', '2023-03-24', 'z', 'blue')

I am trying to use the subquery below, but it does not handle the circumstance of a relationship changing and then changing back.

WITH es_account as (
    SELECT
        t1.*,
        t1.load_timestamp AS effective_from,
        LEAD(t1.load_timestamp) OVER (
            PARTITION BY t1.opportunity_id
            ORDER BY
                t1.load_timestamp
        ) AS effective_to
    FROM
        account t1
)
SELECT

*
    
FROM
    es_account;


The desired output can be viewed in sqlfiddle.com with the code below
CREATE TABLE es_accounts (
  account_id VARCHAR(100),
  load_timestamp TIMESTAMP,
  opportunity_id VARCHAR(100),
  effective_from TIMESTAMP,
  effective_to TIMESTAMP
);

INSERT INTO es_accounts VALUES
('1', '2023-03-18','a', '2023-03-18', '2023-03-22' ),
('2', '2023-03-19','b', '2023-03-19', NULL),
('3', '2023-03-20', 'c', '2023-03-20', '2023-03-24'),
('4', '2023-03-21', 'd', '2023-03-21', NULL),
('5', '2023-03-22', 'e', '2023-03-22', NULL),
('1', '2023-03-23', 'f', '2023-03-22', '2023-03-23'),
('1', '2023-03-24', 'a', '2023-03-24', NULL),
('3', '2023-03-24', 'z', '2023-03-24', NULL)
0004
  • 1,156
  • 1
  • 14
  • 49
  • In your sample data is there an example of your problem ("the circumstance of a relationship changing and then changing back")? – EdmCoff Mar 22 '23 at 21:26
  • You want to partition by account_id, not opportunity_id – Adam Kipnis Mar 22 '23 at 21:31
  • What is ``relationship between account & opp started`` and ``relationship between account & opp ended``? Pls, write right result of query. Presume that ``('1', '2023-03-18','a')`` is relation started. Is ``('2', '2023-03-19','b')`` relation ended or ``('1', '2023-03-24', 'a')`` relation end? – ValNik Mar 22 '23 at 22:20
  • @ValNik I went ahead and added the desired output -- does that help? – 0004 Mar 22 '23 at 23:02
  • @0004, are you trying to create a Type 1 or Type 2 dimension table, when you say can revert back? Type 2 tables basically close off the old record and append a new record with a new start_at time (+ is_active field). If you are not familiar with the term, search up Kimball dimensions – Shah Mar 23 '23 at 00:51

2 Answers2

0

As pointed by @Adam Kipnis, you mast use partitioning by account_id. Try query (basically - your query)

select *
  ,load_timestamp effective_from
  ,lead(load_timestamp)over(partition by account_id order by load_timestamp) effective_to
from accounts

This is does not agree with your sample output in place
('1', '2023-03-23', 'f', '2023-03-22', '2023-03-24'),
result consists row
('1', '2023-03-23', 'f', '2023-03-23', '2023-03-24'),
Additional understanding is needed here.

ValNik
  • 1,075
  • 1
  • 2
  • 6
  • ok I edited the result set. so your saying that lead function will handle 1:1 relationship changes and relationships that revert back? Your code threw an error in sqlfiddle – 0004 Mar 23 '23 at 14:24
  • You use mySQL 5.6 version only? I'm not sure about ``lead`` function in v5.6. Example https://dbfiddle.uk/T-MOviW7 – ValNik Mar 23 '23 at 14:51
  • @0004 You're applying too much business logic to what the lead function does. It's no different than doing a group by and order by within your result set and then taking the value from the next row. In your case, you want to group your data by the account id, order it by the load timestamp and for each row in your result set, take the load timestamp of the next row, and set it as the effective end timestamp of the current row. The value of the opportunity id doesn't matter, even if it switches back and forth. – Adam Kipnis Mar 23 '23 at 15:10
  • @AdamKipnis -- I am sorry but I had forgotten one caviot-- post edited. The loadtimestamp updates when any field is changed ie color – 0004 Mar 23 '23 at 15:41
  • That completely changes the scenario. Assuming you ONLY care about building this SCD2 view for the opportunity, you no longer know if the timestamp is because of the opportunity changing or because of another field. For your case, I would look to first create a snapshot table (https://docs.getdbt.com/docs/build/snapshots) which will give you an SCD2 for the entire table. Then I would use a similar query as above against the new snapshot table. Just note that you could pick up multiple changes in between snapshots. – Adam Kipnis Mar 23 '23 at 16:00
  • 1
    I also want to point out that the way you've described it in the question is misleading. The sample data inserts show this is already an SCD2 table (you have 4 records for account_id = 1), but then you're saying the load_timestamp is updated when the color changes. It's confusing. – Adam Kipnis Mar 23 '23 at 16:07
0

You can use below query to find the desired result if you are using snowflake.

SELECT  *, CASE WHEN LAG(LOAD_TIMESTAMP) OVER( PARTITION BY  account_id 
ORDER BY LOAD_TIMESTAMP) IS NULL 
THEN  LOAD_TIMESTAMP ELSE LOAD_TIMESTAMP END  AS FROM_DATE,
IFNULL(DATEADD(d,-1, LEAD(LOAD_TIMESTAMP) OVER( PARTITION BY  account_id 
ORDER BY LOAD_TIMESTAMP )), NULL ) AS TO_DATE
FROM accounts SRC
QUALIFY IFNULL(LAG(src.opportunity_id) OVER (PARTITION BY src.account_id  
ORDER BY SRC.LOAD_TIMESTAMP),'')<> src.opportunity_id
order by load_timestamp
  • This is close enough , but i have a new variation to the question to handle 1-many relationships https://stackoverflow.com/questions/75945018/creating-effectivity-dates-to-model-relations-in-a-table-for-1-many-relationship – 0004 Apr 06 '23 at 00:55