UPDATE Table with 500+ Millions of Records using 50 Parallel SQLPLUS Sessions with Bulk Collect LIMIT and FORALL

Question

I am new to Oracle and I tried to improve the performance of updating a huge table (500+ Millions Records) - no matter how much i try, its killing the performance.

I am trying to UPDATE a Table with 500 Million Records using 50 Parallel SQLPLUS Sessions with Bulk Collect LIMIT and FORALL - Which is taking huge time (Elapsed: 04:32:30.29).

  CURSOR NIT_CURSOR IS
  SELECT SA_ROWID, NEW_TRX_1, NEW_TRX_2
  FROM NIT_SA
  WHERE MOD(DBMS_ROWID.rowid_block_number(SA_ROWID),&2) = &1;
Note: &2 is 50 where as &1 is number of instance invoked from shell between (0-49), 
SA_ROWID is acutal rowid from SA Table

Now, UPDATE Statement is as Below:

  C_COMMIT         CONSTANT PLS_INTEGER := 10000;
  FETCH NIT_CURSOR BULK COLLECT INTO BLK_NIT_SA LIMIT C_COMMIT;     

  FORALL indx IN 1 .. BLK_NIT_SA.COUNT
  UPDATE SA
  SET INS_TRX_ID = BLK_NIT_SA(indx).NEW_TRX_1,
      UPD_TRX_ID = BLK_NIT_SA(indx).NEW_TRX_2
  WHERE ROWID = BLK_NIT_SA(indx).SA_ROWID;

This Script is getting invoked by KSH in 50 Parallel SQLPLUS Sessions:

typeset -A sqlFile
sqlFile[2.3.sql]=50
for counter_i in "${!sqlFile[@]}"
do
  for counter_j in {0..$((${sqlFile[$counter_i]}-1))}
  do
     sqlplus -s ${DATABASE_CONNECT} @${SQLFILE} "${SPOOLFILE}" ${sqlFile[$counter_i]} $counter_j &
  done
done

As per my understanding, since i am directly using the rowid of the table SA, which would be the fastest way of accessing row and 50 SQL Session should be actually process the data much faster, where as with my observation, soon i increase the number the parallel processes of SQL Session from 50 to 100, my update time of each process increases by 2 hours i.e. from 4.5 hours to 7.5 hours.

I am intended to finish it in 1.5 to 2 hours. not sure if its realistic.

Can someone please help me with above?

Are you using Enterprise Edition, and would it be acceptable to lock the entire table while the update is happening? (If so, a parallel update statement would be a faster way to make this change.) — Jon Heller, Mar 12 '19 at 04:05
@JonHeller, Yes this is a Enterprise Edition, and i really doubt that entire table is getting locked, as session is getting committed for every 10K records. Would partitioning of TABLE SA, help me in scenarios? — Nitin Tripathi, Mar 12 '19 at 04:21
What kind of machine do you have? Unless you have 50 cores I don't think it makes much sense to try "parallel(50)" or even 100. — Wernfried Domscheit, Mar 12 '19 at 08:58
It is rather risky to use `ROWID` if the rows are not locked. In principle the ROWID may change at any time. Use at least `CURSOR NIT_CURSOR IS ... FOR UPDATE` — Wernfried Domscheit, Mar 12 '19 at 08:59
@DmitryDemin, NIT_SA Does not have ANY Sequence whereas SA Table does but i dropped the INDEXES in order to make sure that, while updating the table, it is not wasting time in rearranging the index file - and once table is fully updated, then I will recreate the index again. — Nitin Tripathi, Mar 12 '19 at 12:22
@WernfriedDomscheit, Thank you for the suggestion, that was really necessary. — Nitin Tripathi, Mar 12 '19 at 12:23
@NitinTripathi You can watch the progress of the update or insert or delete rows. https://stackoverflow.com/questions/54215043/oracle-11-2-0-1-how-to-identify-the-row-which-is-currently-updated-by-the-up/54251384#54251384 — Dmitry Demin, Mar 12 '19 at 12:32

score 3 · Accepted Answer · edited May 18 '19 at 15:31

3

500+ million records is not a good idea, you can tell tables structure, and what column needs to be updated. We can consider the following options:

if required new values on the table can be calculated dynamically then we can create a view on top of the table so that whenever we read the view, we can get the required values.
create new table and insert all the data from existing table as required (with new values for those columns) and drop the old table, rename the new one

edited May 18 '19 at 15:31

marc_s

732,580
175
1,330
1,459

answered Mar 12 '19 at 05:47

Ramki

453
2
7

Event i had the same thought initially, but the table size made me to change my opinion. – Nitin Tripathi Mar 12 '19 at 13:32

score 2 · Answer 2 · answered Mar 12 '19 at 04:47

2

Converting the PL/SQL into a single SQL statement and using statement parallelism should significantly improve the performance:

alter session enable parallel dml;

merge /*+ parallel(50) */ into sa using
(
    select sa_rowid, new_trx_1, new_trx_2
    from nit_sa
) nit_sa
on (sa.rowid = nit_sa.sa_rowid)
when matched then update set
    sa.ins_trx_id = nit_sa.new_trx_1,
    sa.upd_trx_id = nit_sa.new_trx_2;

The above statement will only read from the table once. Oracle will automatically divide the table into pieces called granules, there's no need to manually partition the tables.

Parallelism is easy in theory - just add a hint and the run time can improve by an order of magnitude. But getting it right in practice can be tricky. Parallelism requires Enterprise Edition, sufficient resources, a sane configuration, etc. And running parallel DML will lock the entire table until the statement is complete - no other DML can run on the table at the same time. (Although other processes can still read from the table.)

answered Mar 12 '19 at 04:47

Jon Heller

34,999
6
74
132

1

500+ Millions of Records is not a good idea, you can tell tables structure, and what column needs to be updated. We can consider the following options: 1. if required new values on the table can be calculated dynamically then we can create a view on top of the table so that whenever we read the view, we can get the required values. 2. create new table and insert all the rows from the existing table as required ( with new values for those columns to be updated ) and drop the old table, rename the new table to original name . – Ramki Mar 12 '19 at 05:05
1

don't update , create newtable https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:6407993912330 – Ramki Mar 12 '19 at 05:09
2

@Ramki There are pros and cons of both approaches. If this is only a one-time thing then an UPDATE, even if it's slower, may be a good idea. Recreating is faster but it's not always trivial to recreate things perfectly (we tend to forget about things like grants, sequence values, etc.) Or if this update modifies only 50M of the 500M rows, then the update might be faster. – Jon Heller Mar 12 '19 at 08:11
@Ramki, Sounds a good idea of creating a new table and renaming it. the only issue seems to me with this is the space, as the Table SA is huge and having a space of more than 220 GB to create that new table. as when i created a NIT_SA table (500+ Millions of records), it just took 12 minutes with parallel 8 hint, so parallel insert is definitely a good idea.. – Nitin Tripathi Mar 12 '19 at 09:34
@JonHeller, Merge has just one drawback, considering the time constraint - if anything goes wrong, and process has to be killed - Records will be locked and until ROLLBACK happens, table/records will be useless (which could be a major constraint - considering the live system). what is your opinion on this? – Nitin Tripathi Mar 13 '19 at 05:52
@NitinTripathi Yes, rows will be locked until the statement is committed or rolled back. Rollback can run in parallel, and is about as fast as the actual change, so in the worst case you need to plan for 2X the MERGE time for a row lock. You can decrease the potential lock time by breaking the statement into chunks, but that would be slower as it requires constantly re-reading the table. If the table was partitioned, then each partition could be updated individually, which would be a little slower but would shrink the potential lock time. There's no perfect solution, only trade-offs. – Jon Heller Mar 13 '19 at 06:06

UPDATE Table with 500+ Millions of Records using 50 Parallel SQLPLUS Sessions with Bulk Collect LIMIT and FORALL

2 Answers2