0

Environment: Oracle 12C

Got a table with about 10 columns which include few clob and date columns. This is a very busy table for an ETL process as described below-

Flat files are loaded into the table first, then updated and processed. The insert and updates happen in batches. Millions of records are inserted and updated.

There is also a delete process to delete old data based on a date field from the table. The delete process runs as a pl/sql procedure and deletes from the table in a loop fetching first n records only based on date field.

I do not want the delete process to interfere with the regular insert/update . What is the best practice to code the delete so that it has minimal impact on the regular insert/update process ?

I can also partition the table and delete in parallel since each partition uses its own rollback segment but am looking for a simpler way to tune the delete process. Any suggestions on using a special rollback segment or other tuning tips ?

APC
  • 144,005
  • 19
  • 170
  • 281
RKA
  • 221
  • 3
  • 14
  • What populates the date field the Delete process uses? Is it related to input date? Is there any possibility that the Update process will change that date? Is there any possibility that the Delete process will target rows which are also being targeted by the Update process? What criteria does the Delete process use to select rows for deletion? – APC Apr 16 '20 at 07:24

1 Answers1

3

The first thing you should look for is to decouple various ETL processes so that you need not do all of them together or in a particular sequence. Thereby, removing the dependency of the INSERTS/UPDATES and the DELETES. While a insert/update you could manage in single MERGE block in your ETL, you could do the delete later by simply marking the rows to be deleted later, thus doing a soft delete. You could do this as a flag in your table column. And use the same in your application and queries to filter them out.

By doing the delete later, your critical path of the ETL should minimize. Partitioning the data based on date range should definitely help you to maintain the data and also make the transactions efficient if it's date driven. Also, look for any row-by-row thus slow-by-slow transactions and make them in bulk. Avoid context switching between SQL and PL/SQL as much as possible.

If you partition the table as a date range, then you could look into DROP/TRUNCATE partition which will discard the rows stored in that partition as a DDL statement. This cannot be rolled back. It executes quickly and uses few system resources (Undo and Redo). You can read more about it in the documentation.

Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
  • Inserts and updates are done by separate components of the Java app- First the data is loaded into the table in batches and committed. Then the second process applies the business rules and process rows again in batches and updates the certain fields of the rows as it processes. The data to be purged is already clearly defined based on a rule which includes the date field ( older than n days). Can share an example of soft delete you suggested ? For partitioning criteria is simple for me as I have base my delete on a date filed. I only need two partitions – current and previous month. – RKA Apr 16 '20 at 01:36
  • 1
    For partitioning, you just need to ensure two things, 1. There are no new inserts for old partition so it doesn't go to default. Depending on this you can decide whether DROP or TRUNCATE. 2. Referential integrity of the rows in the partition to be truncated. If yes, then you need to defer it first, then delete. – Lalit Kumar B Apr 16 '20 at 06:35
  • 1
    @RKA - *"I only need two partitions – current and previous month"* Partitioning like that would only help if you're going to delete records once a month. Also, the usual reminder that Partitioning is a chargeable extra to the Enterprise Edition, so make sure you're properly licensed to use it. – APC Apr 16 '20 at 07:27