Optimizing query for updating millions of data in Oracle?

Question

I have 3 tables:

Table A: 170 million data
Table B: 170 million data
Table C: 130 million data
Table Log

I want to update column colA1 with the value of colC4 if colA2 == colB2 and colB3 == colC3. I created a table:

Table D: 80 millions data to store colB2 and colC3 in order to speed up the query.

I also add index for colB3 in Table D.

Table Log is used to store progress message and completion time every time an iteration is completed (see the script).

Currently my update script is look like this:

v_str_log := 'Begin updating';
p_write_to_log(v_str_log);
commit;

for data_list in
(
    select distinct colC4
    from tableC
)
loop
    update tableA
    set colA1 = data_list.colC4
    where colA2 in
    (
        select colB2
        from tableD
        where colC3 = data_list.colC4
    )
    var_total := var_total + sql%rowcount;
    v_str_log := 'Updated ' || sql%rowcount || ' for ' || card.sim_type || ' with total ' || var_total || ' rows.';
    commit;
end loop;

I have run the procedure and its finished in about 6 hours. But I find from the log, that for the first loop, 3 millions data is executed for 3 minutes, but a few iterations later 5 millions data is executed in about 20 minutes. The query executed not as fast as first iterations

Why could that happened? Can the script be optimized?

Why not use BULK COLLECT or FORALL considering millions of records to update? Have a look at [this](https://blogs.oracle.com/oraclemagazine/bulk-processing-with-bulk-collect-and-forall) — Jacob, Jan 16 '19 at 08:41
And consider to remove commit instruction from inside loop. This also saves you a lot of time. If something will crash during execution - let's say after 30 000 000 records, this 30 000 000 records in exception will still wait for commit. — q4za4, Jan 16 '19 at 08:43
@q4za4 thank you very much for your suggestion about removing the commit from the loop. It reduce the execution time from about 4 hours to 2 hours. :) — Hary Kamaro, Jan 17 '19 at 04:22
@q4za4 will the MERGE INTO command significantly improved the query above or its just for readability? — Hary Kamaro, Jan 17 '19 at 04:59
Your welcome :) For large data sets like this - it should improve your performance :) — q4za4, Jan 24 '19 at 06:31

Marmite Bomber · Answer 1 · 2019-01-16T20:49:55.837

You have to look at the Updatable Join View feature of Oracle - this is exactly your use case.

First join all the tables in a subquery (this will be effictively done with a hash join without the slow row by row loops).

Than UPDATE the subquery

Example

create table a
(col1 number,
col2 number);

create table b
(col2 number,
col3 number);

create table c
(col3 number,
col4 number);

insert into a values(null,1);
insert into b values(1,2);
insert into c values(2,1);


update (
select a.col1, c.col4
from a 
join b on a.col2 = b.col2
join c on b.col3 = c.col3
)
set col1 = col4
;

Note that UJV has some precondition. When missed this leads to

ORA-01779: cannot modify a column which maps to a non key-preserved table

In your case you must back the table B and C with unique index to guarantee the key preserving view

create unique index c_idx on c(col3);
create unique index b_idx on b(col2);

If this is not possible to do on your productive tables, simple use your temporary table and create the unique index on it, the updated view will be simpler joining only the two tables.

Optimizing query for updating millions of data in Oracle?

1 Answers1