I have some raw data sitting in Big-query tables which are truncate load and my daily ETL feed which runs on these raw BQ tables is a daily snapshot of agents which are the daily extracts below .
To give more background ,I am trying to implement a Insert only table to implement this concept of Virtualized SCD type2 logic .This article focuses on implementing SCD type 2 with delta extracts directly.
my requirement is to design a logic/code to figure out this "Insert only" daily extracts so that I can build that virtualized SCD 2 table. I am thinking to put every daily extract in its own daily partition in Big-query table so that I have all the daily changes in one final table to build this view on.
What is an efficient logic/code/design to find the delta extracts everyday and save it in a table of inserts (may be partioned -see final table) in big query ?
Daily extract on 2022-03-01
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12345 | Josh | abcde | 2022-03-01 04:14:06 |
Delta Extract on 2022-03-01 should look like
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12345 | Josh | abcde | 2022-03-01 04:14:06 |
Daily extract on 2022-03-02 (New record for 12346)
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12345 | Josh | abcde | 2022-03-02 04:14:06 |
12346 | Mark | fghij | 2022-03-02 04:14:06 |
Delta Extract on 2022-03-02 should look like(should Only capture changes)
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12346 | Mark | fghij | 2022-03-02 04:14:06 |
Daily extract on 2022-03-03 (updated record for 12345)
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12345 | Josher | mnopi | 2022-03-03 04:14:06 |
12346 | Mark | fghij | 2022-03-02 04:14:06 |
Delta Extract on 2022-03-03 should look like(should only capture changes)
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12345 | Josher | mnopi | 2022-03-03 04:14:06 |
I have to build this final table of Inserts(new +changed records) by appending everyday's delta extract so that I can build a view to calculate my row_end_eff_ts
Final table of Inserts should look like
Agent_Key | Agent_name | MD5_CD | row_eff_ts |
---|---|---|---|
12345 | Josh | abcde | 2022-03-01 04:14:06 |
12346 | Mark | fghij | 2022-03-02 04:14:06 |
12345 | Josher | mnopi | 2022-03-03 04:14:06 |