Massive UPDATE vs. MERGE performance on Oracle

Question

I am trying to update a destination table from a source table using a single, massive UPDATE statement, but the execution time is way longer than it should.

Query

UPDATE MY_DEST
SET (DEST_B, DEST_C) = (
  SELECT SRC_A + SRC_B, SRC_B
  FROM MY_SRC
  WHERE SRC_KEY = DEST_KEY AND SRC_DATE = DEST_DATE
);

Both tables contain roughly 10 to 13 million rows, they have matching primary keys and we can safely assume that each row in the destination table has a corresponding row in the source table.

Table definitions

CREATE TABLE MY_SRC (
  SRC_KEY VARCHAR2(50),
  SRC_DATE DATE,
  SRC_A NUMBER(15,2),
  SRC_B NUMBER(15,2),
  CONSTRAINT MY_SRC_PK PRIMARY KEY (SRC_KEY, SRC_DATE)
);

CREATE TABLE MY_DEST (
  DEST_KEY VARCHAR2(50),
  DEST_DATE DATE,
  DEST_B NUMBER(15,2),
  DEST_C NUMBER(15,2),
  CONSTRAINT MY_DEST_PK PRIMARY KEY (DEST_KEY, DEST_DATE)
);

Execution plan

Plan hash value: 3904754293

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |           |    12M|   675M|   128M (20)| 01:23:55 |
|   1 |  UPDATE                      | MY_DEST   |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | MY_DEST   |    12M|   675M| 69756   (1)| 00:00:03 |
|   3 |   TABLE ACCESS BY INDEX ROWID| MY_SRC    |     1 |    46 |     4   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | MY_SRC_PK |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("SRC_KEY"=:B1 AND "SRC_DATE"=:B2)

Questions

Is the above UPDATE query hopelessly forced to use a slow, row-by-row execution plan?

Can I optimize the above query only by rewriting it as a MERGE statement, such as the following one?

Alternate query, rewritten as a MERGE

MERGE INTO MY_DEST
USING (SELECT SRC_KEY, SRC_DATE, SRC_B, SRC_A + SRC_B AS SRC_C FROM MY_SRC)
ON (DEST_KEY = SRC_KEY AND DEST_DATE = SRC_DATE)
WHEN MATCHED THEN UPDATE SET DEST_B = SRC_B, DEST_C = SRC_C;

Alternate execution plan

Plan hash value: 2444580570

----------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |         |    12M|   638M|       |   359K  (1)| 00:00:15 |
|   1 |  MERGE               | MY_DEST |       |       |       |            |          |
|   2 |   VIEW               |         |       |       |       |            |          |
|*  3 |    HASH JOIN         |         |    12M|  2260M|   716M|   359K  (1)| 00:00:15 |
|   4 |     TABLE ACCESS FULL| MY_SRC  |    12M|   568M|       |   162K  (1)| 00:00:07 |
|   5 |     TABLE ACCESS FULL| MY_DEST |    12M|  1695M|       | 69756   (1)| 00:00:03 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEST_KEY"="SRC_KEY" AND "DEST_DATE"="SRC_DATE")

Can I get the same, good performance of the alternate MERGE statement when using an UPDATE?

I am using Oracle 12c.

Does your merge statement run faster than your update statement? If so, I would use that instead. You could potentially reduce the amount of work by adding a predicate to the `when matched` clause to only update when it has to, e.g. `where dest_b != src_b or dest_c != src_c`. If all or most of the rows match that, performance won't be improved that much, but if this is a frequently run process (e.g. daily), you might see some benefits. — Boneist, Jan 21 '21 at 14:54
@Boneist this is a very good point relevant to *preformence*, but you should not forgert that columns are `NULL`able and consider it in your predicate. Such as `nvl(dest_b,'x') != nvl(src_b,'x') to get the exact behaviour ot the original statement. — Marmite Bomber, Jan 21 '21 at 16:24

Marmite Bomber · Accepted Answer · 2021-01-21T16:10:49.927

1) Yes, you never want to make an (estimated) 12M times an index access on a large table.

The index blocks will be largely on the disc, so you may expect about access to 100 rows per seconds.

Even if you get 1000 index aceess per second due to caching, you still have to wait hours for processing 12M rows.

So the execution plan 1 is NO GO for a large table.

2) The execution plan of the MERGE (hash join with two full table scans) looks fine and should work.

3) Yes you can use UPDATE statement without problems in your setup. You will use the feature of the updatable join view

Query

update (
select SRC_A, SRC_B,DEST_B, DEST_C 
from MY_DEST d
left outer join MY_SRC s
on d.DEST_KEY = s.SRC_KEY and d.DEST_DATE = s.SRC_DATE)
set DEST_B = SRC_A + SRC_B,
    DEST_C = SRC_B
;

Execution plan

---------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT    |         |  1000K|    57M|       |  6682   (1)| 00:00:01 |
|   1 |  UPDATE             | MY_DEST |       |       |       |            |          |
|*  2 |   HASH JOIN OUTER   |         |  1000K|    57M|    40M|  6682   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| MY_DEST |  1000K|    28M|       |  1341   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| MY_SRC  |  1000K|    28M|       |  1341   (2)| 00:00:01 |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("D"."DEST_DATE"="S"."SRC_DATE"(+) AND 
              "D"."DEST_KEY"="S"."SRC_KEY"(+))

You see the similar execution plan is created as for the MERGE, so you will excpect also similar performance.

Final Notes

You may additionally use parallel hint to speed up.

Don't forget you must enable parallel dml in your session.

ALTER SESSION ENABLE PARALLEL DML;

Your MERGE statement is not equal to your first UPDATE statement. The difference appears when a primary key exists in the destination table but not in the source table.

The UPDATE resets the destination columns to NULL while the MERGE let them unchanged.

My UPDATEstatement use an outer join so it behaves as your UPDATE - switch to inner join to get the MERGE behaviour.

Massive UPDATE vs. MERGE performance on Oracle

1 Answers1

Linked