0

There is a table with millions of records which has duplicate records as well. what is the process of creating a new entity as surrogate key (which denotes sequence no.)

E.g table structure

col1 col2

101 A 101 A 101 B 102 A 102 B

I would like to create a new column (col3) - which denotes a seq no.


col1 col2 col3

101 A 1 101 A 2 101 B 3 102 A 1 102 B 2

Please suggest me steps to follow to create surrogate key for existing records(300 million), and even when new records are loaded ( I assume trigger is needed to while inserting).

Sharanya
  • 1
  • 1
  • 1
  • Please include your Oracle version (such as 11.2.0.4) in your question, as the answer may depend on the version. (Do that for ALL questions you ask here and on other sites - for the same reason!) –  Jul 12 '18 at 19:41

1 Answers1

0

Just use row_number function to populate col3 :

For already existing records apply :

SQL> create table tab(col1 int , col2 varchar2(1));

Table created
SQL> insert all
  2         into tab values(101,'A')
  3         into tab values(101,'A')
  4         into tab values(101,'B')
  5         into tab values(102,'A')
  6         into tab values(102,'B')
  7  select * from dual;

5 rows inserted
SQL> create table tab_ as
  2  select col1, col2,
  3         row_number() over (partition by col1 order by col2) as col3
  4    from tab;

Table created
SQL> drop table tab;

Table dropped
SQL> alter table tab_ rename to tab;

Table altered

OR Alternatively ( without recreating the table ) :

SQL> create table tab(col1 int , col2 varchar2(1));

Table created
SQL> insert all
  2            into tab values(101,'A')
  3            into tab values(101,'A')
  4            into tab values(101,'B')
  5            into tab values(102,'A')
  6            into tab values(102,'B')
  7     select * from dual;

5 rows inserted
SQL> alter table tab add col3 integer;

Table altered
SQL> declare
  2   i pls_integer := 0;    
  3 begin
  4   for c in
  5     (
  6      select rowid, col1, col2,
  7         row_number() over (partition by col1 order by col2) as col3
  8        from tab
  9     )
 10   loop
 11    update tab t
 12       set t.col3 = c.col3
 13     where t.rowid = c.rowid;
 14     i:= i+1;
 15     if ( ( i mod 10000 ) = 0 ) then commit; end if;
 16   end loop;
 17  end;     
 18  commit;
 19  /

PL/SQL procedure successfully completed

SQL> select * from tab;

COL1 COL2  COL3
---- ---- -----
  101 A       1
  101 A       2
  101 B       3
  102 A       1
  102 B       2

  5 rows selected

For upcoming (newly inserted) records you may use a trigger as you mentioned :

SQL> create or replace trigger trg_ins_tab
  2  before insert on tab
  3  referencing new as new old as old for each row
  4  declare
  5  begin
  6      select nvl(max(col3),0) + 1
  7        into :new.col3
  8        from tab
  9       where col1 = :new.col1;
 10  end;
 11  /

Trigger created
SQL> insert into tab(col1,col2) values(101,'C');

1 row inserted
SQL> select *
  2    from tab t
  3   order by t.col1, col3;

COL1 COL2  COL3
---- ---- -----
  101 A       1
  101 A       2
  101 B       3
  101 C       4
  102 A       1
  102 B       2

  6 rows selected
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
  • How about when new records are loaded to this table, trigger is not needed to generate sequence number? – Sharanya Jul 12 '18 at 20:15
  • @Sharanya you already have a logic to get the desired values and no need to populate another table's column through any means such as db triggers. – Barbaros Özhan Jul 12 '18 at 20:20
  • The above works will work for the existing records. If a job runs and loads the table again, it needs to populate the seq no likewise. I can create a new table for existing records and populate the new column, I am expecting a table structure change. – Sharanya Jul 12 '18 at 20:32
  • @Sharanya even if you want to populate a column, no need a db trigger, using the above query as a cursor in a pl/sql procedure, you may update newly added column(col3). – Barbaros Özhan Jul 12 '18 at 20:48
  • how can i write the above query to update the table with seq no, when new records are inserted and when existing records are inserted to this table? – Sharanya Jul 16 '18 at 13:21
  • @Sharanya I've edited the answer. – Barbaros Özhan Jul 16 '18 at 14:46
  • Thank you ! That was helpful. :) – Sharanya Jul 16 '18 at 15:36
  • Trigger works for when existing values are added to the table( like 101, 102), but for a completely new record , e.g 104 , then record is inserted with 'null' in col3. Is there was it can be fixed for new values? – Sharanya Jul 16 '18 at 15:41
  • if (COUNT)> 1 then update with max seq no -- for existing records else insert with '1' in col3 -- for new records e.g 104 end if – Sharanya Jul 16 '18 at 15:56
  • @Sharanya you're right, updated again as `max(col3)` to `nvl(max(col3),0)` – Barbaros Özhan Jul 16 '18 at 16:00
  • @Ozhan I have been trying to update about 300 million records, update has been running for more than couple of hours. **Is bulk update is possible in this scenario ? ** – Sharanya Jul 20 '18 at 17:30
  • For Insert operations we may use `Bulk Insert` for performance gain but I don't know any concept called `Bulk Update`, but I can suggest you to commit your updates regularly as some number of update reached. – Barbaros Özhan Jul 20 '18 at 19:20
  • @Ozhan - Update takes about 8-10 hours to update 300+ million records. I tried to 1. take a back up of the existing table records, 2. truncate original table, 3. added new columns , 4. Created trigger for adding seq no, 5. Inserting all the back up records to original table. I get " Mutating error", I tried to add "PRAGMA AUTONOMOUS_TRANSACTION" in trigger but unique seq no. are not loading. Please suggest me a way to be able to insert all the records without mutating error. Thank you ! – Sharanya Jul 30 '18 at 18:42