oracle track the history on a table with timestamp columns

Question

I have 2 tables in the oracle 12c database with the below structure. Table A has the incoming data from an application with modified date timestamps, each day we may get around 50,000 rows in table A. the goal is to use the table table A's data and insert into the final target table B(usually has billions of rows) by using table A's data as the driving data set.

A record needs to be inserted/merged in table B only when there is a change in the incoming dataset attributes. basically the purpose is to track the history/journey of a given product with valid timestamps only when there are changes in its attributes such as state and zip_cd.

See table structures below

Table A  ( PRODUCT_ID, STATE, ZIP_CD, Modified_dt)
                
           'abc',  'MN', '123', '3/5/2020 12:01:00 AM'
           'abc',  'MN', '123', '3/5/2020  6:01:13 PM'
           'abc',  'IL', '223', '3/5/2020  7:01:15 PM'
           'abc',  'OH', '333', '3/5/2020  6:01:16 PM'
           'abc',  'NY', '722', '3/5/2020  4:29:00 PM' 
           'abc',  'KS', '444', '3/5/2020  4:31:41 PM'    
           'bbc',  'MN', '123', '3/19/2020 2:47:08 PM' 
           'bbc',  'IL', '223', '3/19/2020 2:50:37 PM' 
           'ccb',  'MN', '123', '3/21/2020 2:56:24 PM'
           'dbd',  'KS', '444', '6/20/2020 12:00:00 AM'  

Target Table B  (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To,  LATEST_FLAG)
                '1',    'abc',    'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM',   'N'
                '2',    'abc',    'AR', '555', '3/3/2020 6:01:14  PM',  '3/3/2020 6:01:14 PM',  'N'
                '3',    'abc',    'CA', '565', '3/3/2020 6:01:15  PM',  '3/4/2020 4:28:59 PM',  'N'
                '4',    'abc',    'CA', '777', '3/4/2020 4:29:00  PM',  '12/31/2099',           'Y'
                '5',    'bbc',    'MN', '123', '3/4/2020 4:31:41  PM',  '3/19/2020 2:47:07 PM', 'N'
                '6',    'bbc',    'MN', '666', '3/18/2020 2:47:08 PM',  '3/19/2020 2:50:36 PM', 'N'
                '7',    'bbc',    'MN', '777', '3/18/2020 2:50:37 PM',  '12/31/2099',         , 'Y'
                '8',    'ccb',    'MN', '123', '3/20/2020 2:56:24 PM',  '12/31/2099',           'Y'

Rules for populating data into table B:

the primary key on the output table is product_id and valid_from field. the incoming data from table A will always have modified dt timestamps greather than the existing table.
inorder to insert data, we will have to compare latest_flag = 'Y' record from target table B and the incoming data from table A and only when there is a change in the attributes state and zip_cd, then a record needs to be inserted in table B from table A. valid_to column is a calcuated field which is always 1 second lower than the next row's valid from date, and for the latest row its defaulted to '12/31/2099'. Similary, latest_flag column is a calcuated column that indicates the current row of a given product_id
In the incoming dataset if there are multiple rows without any changes compared to the previous row or existing data in table B(latest_flag='Y') then those should be ignored as well. as an example row 2 and row 9 from Table A are ignored as there are no changes in the attributes state, zip_cd when compared to their previous rows for that product.

Based on the above rules, I need to merge the table A data into table B and the final ouput looks like below

Table B  (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To,  LATEST_FLAG)
                '1',    'abc',    'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM',  'N'
                '2',    'abc',    'AR', '555', '3/3/2020 6:01:14  PM'  '3/3/2020 6:01:14 PM',  'N'
                '3',    'abc',    'CA', '565', '3/3/2020 6:01:15  PM'  '3/4/2020 4:28:59 PM',  'N'
                '4',    'abc',    'CA', '777', '3/4/2020 4:29:00  PM'  '3/5/2020 12:00:00 AM', 'N'
                '5',    'abc',    'MN', '123', '3/5/2020 12:01:00 AM', '3/5/2020 7:01:14  PM', 'N'
                '6',    'abc',    'IL', '223'  '3/5/2020  7:01:15 PM', '3/5/2020 6:01:15 PM',  'N'
                '7',    'abc',    'OH', '333', '3/5/2020  6:01:16 PM', '3/5/2020 4:28:59 PM',  'N'
                '8',    'abc',    'NY', '722', '3/5/2020  4:29:00 PM', '3/5/2020 4:31:40  PM', 'N'
                '9',    'abc',    'KS', '444', '3/5/2020  4:31:41 PM', '12/31/2099',           'Y'
                '10',   'bbc',    'MN', '123', '3/4/2020 4:31:41  PM'  '3/19/2020 2:47:07 PM', 'N'
                '11',   'bbc',    'MN', '666', '3/18/2020 2:47:08 PM'  '3/19/2020 2:50:36 PM', 'N'
                '12',   'bbc',    'MN', '777', '3/18/2020 2:50:37 PM'  '3/19/2020 2:47:07 PM', 'N'
                '13',   'bbc',    'MN', '123', '3/19/2020 2:47:08 PM'  '3/19/2020 2:50:36 PM', 'N'
                '14',   'bbc',    'IL', '223', '3/19/2020 2:50:37 PM'  '12/31/2099',           'Y'
                '15',   'ccb',    'MN', '123', '3/20/2020 2:56:24 PM'  '12/31/2099',           'Y'
                '16',   'dbd',    'KS', '444', '6/20/2020 12:00:00 AM' '12/31/2099',           'Y'

Looking for suggestions to solve this problem. LIVE SQL link:

https://livesql.oracle.com/apex/livesql/s/kfbx7dwzr3zz28v6eigv0ars0

Thank you.

is it table b partitioned somehow ? I mean , you said has billions of rows, so my assumption is that the table is partitioned. It is very important in order to achieve partition pruning. — Roberto Hernandez, Jul 22 '20 at 17:25
Hi Roberto, Yes the table is partitioned on latest_flag for partition pruning. — user1751356, Jul 22 '20 at 17:31
are you telling me that the table with biilions of records is partitioned by latest_flag ? a field that can be only Y or N ? how many records do you have in each partition ? are there other indexes besides the composite primary key ? — Roberto Hernandez, Jul 22 '20 at 20:46
you said "inorder to insert data, we will have to compare latest_flag = 'Y' record from target table B and the incoming data from table A " . There is no latest_flag in table A, what do you want to compare ? — Roberto Hernandez, Jul 22 '20 at 20:53
so basically we don/t need latest flag in table A. I think the data in table A needs to be ordered first and then check to see if the product exists in table B latest_flag= 'Y'. since this is somewhat complex, its fine if we can leverage some intermediate temp tables in between. — user1751356, Jul 23 '20 at 02:22
ok...why there is no `last_flag = 'Y'` for `('bbc', 'MN')` pair? — Harshil Doshi, Jul 25 '20 at 15:47
as i mentioned in the question, primary key is the product id, so the latest value for product BBC state field is not MN. It's been changed to IL based on the incoming data from table A. see row 14 in the final output. — user1751356, Jul 25 '20 at 16:33
There are some contradictory statements and the data you provided in live sql. In the question point 3 you say row 2 should be ignore because of no change but then if i compare the latest record for product 'abc' in table B i could see changes. ('MN'-123 vs 'KS-444. and second point the modified date should be > valid from of latest record in TableB or it is ">=" (as in the live sql link). kindly check — Sujitmohanty30, Jul 27 '20 at 09:11

score 0 · Answer 1 · answered Jul 27 '20 at 12:14

I would give my first try with the understanding I have. The cursor as source for inserting to TableB would look like,

SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL 
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0;

LEFT OUTER JOIN to check if the record exists in TableB and the WHERE clause checks for modified_date greater than the valid_from for the latest_flag = 'Y'
Inner Case statement will tell us whether the attributes are changed or not and in case the product_id is not present it also consider it as first entry and the insert_flag will be 1
Outer case statement provides the valid_to in case of last record as per modified date column to 31-12-2099
Not completely clear with respect to point 3 but I believe the case statement is what we need for.

At the end I didn't consider the performance problem here. you can think of converting it to PL/SQL block and other collection methods to process data in chunk.

Also I have here one question , what happens to the record with product id "dbd" (which is a new entry and doesn't exists in TableB) if present multiple times in tableA ?

for point 3..best way to explain that is...if you think of a slowly changing dimension in data warehouse. a row gets inserted only when there is a change in any of the attributes for a given natural key (product_id in this case) — user1751356, Jul 27 '20 at 12:22
The answer you provided might work, but as you mentioned it will take more time to loop through each record.iam not that familiar with plsql.for DBD product if there are multiple rows in table A, all the rows needs to verified to make sure only the rows that have atleast one single change needs to be inserted in table B — user1751356, Jul 27 '20 at 12:26
We probably do not need loop as it is always an insertion to tableB. I am only not sure if we receive a record first time in tableA as i mentioned in my last point present multiple times. Did you get what i mean ? — Sujitmohanty30, Jul 27 '20 at 12:30
yes, for all the new incoming products example DBD, same logic is applicable. so for DBD if there are multiple rows which indicates changes at different times, only the records with changes will get inserted in table B with no overlapping/gaps between valid_from, valid_to date columns — user1751356, Jul 27 '20 at 12:34
Thanks.However could you check with this at least ad find out the failing cases and then we could try modify the query as per your need ? — Sujitmohanty30, Jul 27 '20 at 12:43
@user1751356: I was hoping you would update with your status. — Sujitmohanty30, Aug 03 '20 at 19:24

score 0 · Answer 2 · answered Jul 28 '20 at 05:35

This is Slowly Changing Dimensions (SCD) Type 2 problem in data warehousing (Kimball approach). You can see a short definitions here

https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/slowlychangingdimensions.htm

Support for SCD Type 2 is available in Enterprise ETL option of OWB 10gR2 only as described in the above link. If that's not available and you have to use PL/SQL, you can check out the following approach. Unfortunately, Oracle PL/SQL does not offer a straight forward solution unlike MS SQL.

Implementing Type 2 SCD in Oracle

Roberto Hernandez · Accepted Answer · 2020-07-29T07:36:47.270

I tried to see how to do this in SQL but it was impossible to me because of the logic and also the sequence_key reset that you have in your desired ouput.

So, here my suggestion in PL/SQL

SQL> select * from table_a ;

PRODUCT_ID                     STATE                          ZIP_CD                         MODIFIED_
------------------------------ ------------------------------ ------------------------------ ---------
abc                            MN                             123                            05-MAR-20
abc                            MN                             123                            05-MAR-20
abc                            IL                             223                            05-MAR-20
abc                            OH                             333                            05-MAR-20
abc                            NY                             722                            05-MAR-20
abc                            KS                             444                            05-MAR-20
bbc                            MN                             123                            19-MAR-20
bbc                            IL                             223                            19-MAR-20
ccb                            MN                             123                            19-MAR-20
dbd                            KS                             444                            19-MAR-20

10 rows selected.

SQL> select * from table_b ;

SEQUENCE_KEY PRODUCT_ID                     STATE                          ZIP_CD                         VALID_FRO VALID_TO  L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
           1 abc                            AR                             999                            05-MAR-20 05-MAR-20 N
           2 abc                            AR                             555                            05-MAR-20 05-MAR-20 N
           3 abc                            CA                             565                            05-MAR-20 05-MAR-20 N
           4 abc                            CA                             777                            05-MAR-20 31-DEC-99 Y
           5 bbc                            MN                             123                            05-MAR-20 05-MAR-20 N
           6 bbc                            MN                             666                            05-MAR-20 05-MAR-20 N
           7 bbc                            MN                             777                            19-MAR-20 31-DEC-99 Y
           8 ccb                            MN                             123                            19-MAR-20 31-DEC-99 Y

8 rows selected.

Now, I used this piece of PL_SQL code

declare 
type typ_rec_set IS  RECORD
(
 PRODUCT_ID        VARCHAR2(30 CHAR),
 STATE             VARCHAR2(30 CHAR),
 ZIP_CD            VARCHAR2(30 CHAR),
 VALID_FROM        DATE             ,
 VALID_TO          DATE             ,
 LATEST_FLAG       VARCHAR2(1 CHAR) 
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab  typ_rec_tab;
begin
SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
      BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL 
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0  ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last 
LOOP
    -- begin block 
    begin
        insert into table_b 
        (
         sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG                 
        )
        values
        (
        ( select max(sequence_key)+1 from table_b ),
        l_hdr_tab(i).product_id ,
        l_hdr_tab(i).state ,
        l_hdr_tab(i).zip_cd ,
        l_hdr_tab(i).valid_from ,
        l_hdr_tab(i).valid_to ,
        l_hdr_tab(i).latest_flag 
        );
     end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select     sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG  ,
         row_number() over ( order by product_id,valid_from ) as new_seq 
          from table_b ) s
on ( s.rowid = t.rowid ) 
when matched then 
  update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/

Then I run it

SQL> host cat proc.sql
declare
type typ_rec_set IS  RECORD
(
 PRODUCT_ID        VARCHAR2(30 CHAR),
 STATE             VARCHAR2(30 CHAR),
 ZIP_CD            VARCHAR2(30 CHAR),
 VALID_FROM        DATE             ,
 VALID_TO          DATE             ,
 LATEST_FLAG       VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab  typ_rec_tab;
begin
SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
      BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0  ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
    -- begin block
    begin
        insert into table_b
        (
         sequence_key ,
         PRODUCT_ID   ,
         STATE        ,
         ZIP_CD       ,
         VALID_FROM   ,
         VALID_TO     ,
         LATEST_FLAG
        )
        values
        (
        ( select max(sequence_key)+1 from table_b ),
        l_hdr_tab(i).product_id ,
        l_hdr_tab(i).state ,
        l_hdr_tab(i).zip_cd ,
        l_hdr_tab(i).valid_from ,
        l_hdr_tab(i).valid_to ,
        l_hdr_tab(i).latest_flag
        );
     end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select     sequence_key ,
         PRODUCT_ID   ,
         STATE        ,
         ZIP_CD       ,
         VALID_FROM   ,
         VALID_TO     ,
         LATEST_FLAG  ,
         row_number() over ( order by product_id,valid_from ) as new_seq
          from table_b ) s
on ( s.rowid = t.rowid )
when matched then
  update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/

SQL> @proc.sql

PL/SQL procedure successfully completed.

SQL> select * from table_b order by sequence_key ;

SEQUENCE_KEY PRODUCT_ID                     STATE                          ZIP_CD                         VALID_FRO VALID_TO  L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
           1 abc                            AR                             999                            05-MAR-20 05-MAR-20 N
           2 abc                            NY                             722                            05-MAR-20 05-MAR-20 N
           3 abc                            CA                             777                            05-MAR-20 31-DEC-99 Y
           4 abc                            KS                             444                            05-MAR-20 05-MAR-20 N
           5 abc                            MN                             123                            05-MAR-20 05-MAR-20 N
           6 abc                            AR                             555                            05-MAR-20 05-MAR-20 N
           7 abc                            CA                             565                            05-MAR-20 05-MAR-20 N
           8 abc                            OH                             333                            05-MAR-20 05-MAR-20 N
           9 abc                            IL                             223                            05-MAR-20 31-DEC-99 Y
          10 bbc                            MN                             666                            05-MAR-20 05-MAR-20 N
          11 bbc                            MN                             123                            05-MAR-20 05-MAR-20 N

SEQUENCE_KEY PRODUCT_ID                     STATE                          ZIP_CD                         VALID_FRO VALID_TO  L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
          12 bbc                            MN                             777                            19-MAR-20 31-DEC-99 Y
          13 bbc                            IL                             223                            19-MAR-20 31-DEC-99 Y
          14 ccb                            MN                             123                            19-MAR-20 31-DEC-99 Y
          15 dbd                            KS                             444                            19-MAR-20 31-DEC-99 Y

15 rows selected.

SQL>

Just let me know any doubts you might have. I know that for sure I miss something ;)

UPDATE

I realized that I have an useless operation in the loop, the calculation of the maxvalue for the field SEQUENCE_KEY. I have a better version of the procedure here:

declare
type typ_rec_set IS  RECORD
(
 PRODUCT_ID        VARCHAR2(30 CHAR),
 STATE             VARCHAR2(30 CHAR),
 ZIP_CD            VARCHAR2(30 CHAR),
 VALID_FROM        DATE             ,
 VALID_TO          DATE             ,
 LATEST_FLAG       VARCHAR2(1 CHAR) 
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab  typ_rec_tab;
r    pls_integer := 1;
vseq pls_integer;
begin
-- calculate value sequence 
select max(sequence_key) into vseq from table_b ;
SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
      BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL 
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0  ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last 
LOOP
    -- begin block 
    vseq := vseq + r ;
    begin
        insert into table_b 
        (
         sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG                 
        )
        values
        (
        vseq ,
        l_hdr_tab(i).product_id ,
        l_hdr_tab(i).state ,
        l_hdr_tab(i).zip_cd ,
        l_hdr_tab(i).valid_from ,
        l_hdr_tab(i).valid_to ,
        l_hdr_tab(i).latest_flag 
        );
     end;
    r := r + 1; 
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select     sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG  ,
         row_number() over ( order by product_id,valid_from ) as new_seq 
          from table_b ) s
on ( s.rowid = t.rowid ) 
when matched then 
  update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/

oracle track the history on a table with timestamp columns

3 Answers3