Oracle BEST PRACTICE to update 50 million child rows in a table using value from parent table

Question

I have a child table with 100 million rows and need to update 50 million rows of a column using the value from the parent table. I have read around that assuming if we have enough space, it would be the fastest to "create table as select", but I want to know if anyone disagrees or if other factors are required in order to make a better guess? Would it be better to go this route versus using pl/sql's BULK COLLECT FORALL UPDATE feature?

If this is a process that has to happen frequently, then try it both ways while tracing it using extended SQL trace and then you won't have to guess which one is better. If it's something that has to happen only once, then it probably doesn't matter which way you go. But you'd have to show more of your work before we could be highly confident in saying "it doesn't matter". — Jeff Holt, Jul 12 '17 at 21:39
If the field on the child table is part of an index or indexes I suggest you drop the index(es) prior to performing the update, and recreate it/them afterwards. — Bob Jarvis - Слава Україні, Jul 12 '17 at 23:23

score 1 · Answer 1 · answered Jul 12 '17 at 23:22

If you have a lot of data then CREATE TABLE AS SELECT is definitely faster because it does not require UNDO table space. However, to recreate all the indices on the new table can be quite a hassle due to name conflicts.

Good news is: 50 min rows is not really a lot of data. If you have a modern midrange server it should not cause problems so it is not worth the extra work. The best way to find out is to make a copy of the original table (including all indices) and try the update there. Then you get a rough idea how long it will take.

score 0 · Answer 2 · answered Jul 13 '17 at 07:16

Parallel UPDATE is probably the best option for a large change to a child table. (If you have Enterprise Edition, sufficient resources, a sane configuration, etc.)

alter session enable parallel dml;
update /*+ parallel */ ...;

(You might want to play with different parallel numbers, like parallel(8). The default degree of parallelism is usually good enough. But some platforms like SPARC inflate their "CPU_COUNT", leading to ridiculous degrees of parallelism.)

Parallel UPDATE is likely not the optimal solution. Recreating the objects can be faster because it can almost completely avoid generating REDO and UNDO. But re-creating objects is usually buggy and getting that optimal performance is tricky.

Here are things to consider before you decide to simply drop and recreate a table:

Grants. Save and re-apply the object grants after the objects are recreated.
Dependent objects. The process needs to re-create all objects, and dependent objects, in the exact same way. This can be painfully difficult depending on how complex your schema is. DBMS_METADATA can be tricky, and in some cases still won't make the objects exactly the same way. If you decide to hard-code the DDL instead you have to remember to update the process whenever the objects change.
Invalid objects. Most objects will automatically recompile when necessary. But you probably don't want to wait for that because it always looks bad to have invalid objects. And even if they do compile correctly, some programs may still get those pesky ORA-04068: existing state of packages has been discarded errors. (Because most PL/SQL programmers are unaware of session state and make every package variable public by default.)
Statistics. Simply re-gathering them after the table is re-created is not always sufficient. Histograms depend on whether columns were used in a predicate. If the table is re-created all the columns are new and no histograms will be initially created.
Direct-path writes are elusive. A parent-child table implies a foreign key, which normally prevents direct-path writes. The process needs to disable or drop the foreign key. And also set the table and index to NOLOGGING, and then remember to set them back to LOGGING at the end. When you re-create the foreign key, if you want to do it in parallel you have to initially create it as NOVALIDATE, set the table to parallel, enable validate the constraint, and then set the table back to NOPARALLEL.

In a large data warehouse it's worth going through all those steps and building code for dealing with all the issues. If this is your only large table UPDATE I suggest you avoid that work and accept a slightly non-optimal solution.

Oracle BEST PRACTICE to update 50 million child rows in a table using value from parent table

2 Answers2