0

I am trying to solve the issue in a query without writing a function. I have a table which has user information as below:

user_id user_name user_phone_number user_location created_at
1         abc          123              X         2014-01-01
1         abc          123              X         2014-02-01
1         abc          123              Y         2014-10-01
1         abc          123              Z         2014-11-01
1         abc          123              Z         2014-12-01
1         abc          123              X         2015-01-01

I need to make this to SCD-Type II table as below:

user_id user_name user_ph_num user_location valid_from_dt valid_to_dt   
1         abc        123         X           2014-01-01   2014-09-30
1         abc        123         Y           2014-10-01   2014-10-31
1         abc        123         Z           2014-11-01   2014-12-31
1         abc        123         X           2015-01-01   2999-12-31

what I thought to make this happen is:

In one step: temp_table1:

id user_name user_ph_num user_location created_at is_same_with_previous
1         abc     123              X   2014-01-01          f
1         abc     123              X   2014-02-01          t
1         abc     123              Y   2014-10-01          f
1         abc     123              Z   2014-11-01          f
1         abc     123              Z   2014-12-01          t
1         abc     123              X   2015-01-01          f

In second step:

temp_table2:

id user_name user_ph_num user_loc created_at is_same_with_previous sr_no
1         abc     123        X    2014-01-01          f             1
1         abc     123        X    2014-02-01          t             1
1         abc     123        Y    2014-10-01          f             2
1         abc     123        Z    2014-11-01          f             3
1         abc     123        Z    2014-12-01          t             3
1         abc     123        X    2015-01-01          f             4

After this, I can query as below and get the desired result:

select id,user_name,user_ph_num,user_loc,min(created_at) as valid_from,
max(created_at) as valid_to
from temp_table2
group by sr_no,id,user_name,user_ph_num,user_loc;

I created step-1 and I am failing to create temp_table2(step-2). I am using the following query to accomplish that:

select setval('ser_no',2);

select *
CASE WHEN is_same_with_previous 
THEN (SELECT setval('ser_no',nextval('ser_no')-1))
ELSE nextval('ser_no') END as diff_sr_no
from temp_table1;

It is resulting as: temp_table2:

id user_name user_ph_num user_loc created_at is_same_with_previous sr_no
1         abc     123        X    2014-01-01          f             1
1         abc     123        X    2014-02-01          t             1
1         abc     123        Y    2014-10-01          f             2
1         abc     123        Z    2014-11-01          f             3
1         abc     123        Z    2014-12-01          t             1
1         abc     123        X    2015-01-01          f             4

Could some one help to solve this issue? Am I using correct approach? Thanks in advance!!!

1 Answers1

0

You are using wrong approach. You should take a look at "window functions" in databases, you might need "row_number()", "rank()" and "lag()". Usually creating SCD2 can be done in a single query. In your case the query should:

  1. Remove duplicates on all fields
  2. Apply rank() over partition by all fields except by "created_at" and order by "created_at"
  3. For each rank select minimum "created_at"
  4. "created_at" is valid_from, "lag(created_at) over partition by ( order by created_at desc)" as valid_to
0x0FFF
  • 4,948
  • 3
  • 20
  • 26