Relational database batch processing rows

Question

I have 3 tasks that needs to be done on a relational database (SQL Server or Oracle). 1. Archive (copy a set of rows if status is S from Table1 to ArchiveTable1(archive status set to X)) 2. Delete (Mark ArchiveTable1 rows for delete i.e. update status from X to Y) 3. Purge (Mark ArchiveTable1 rows for purge i.e. update status from Y to Z)

(Delete just means deleting other references in related tables, but keep the row, purge will delete the row from the archive table)

We did this by writing code in an app server timer, for each of the above tasks to pick 1000 rows based on timestamp and specific status and do the process.

This works fine if there is a single app server. But if there are multiple app servers, then in Oracle the table gets locked(not just the rows) and some of the other processes doing the tasks time out or hang.

My question is Are there database design patterns out there to do such batch processing jobs? If so what is the best pattern and how do I get more details about it?

What leads you to believe that the "table gets locked"? Nothing in your description would appear to cause any of your sessions to acquire a lock on the table (other than the lock that prevents DDL while queries are running). Readers do not block writers and writers do not block readers in Oracle so I would wager that you have two sessions trying to update the same row. You could also have something less common like a bitmap index that is causing one session to block the other. — Justin Cave, Mar 16 '16 at 17:48
The Archive process inserts rows, the Delete and Purge processes are not really deleting the rows, they just update the status column to a specific value indicating Marked for Deletion and Marked for Purge. These are update commands happening inside a transaction scope. So the two app server processes might try to update the same rows causing the lock. I know the locks occurred because the table could not be updated from independent sessions and also we had to kill a session which had the table lock before we could use the table again. — Satish Rao, Mar 16 '16 at 18:24
What queries lead you to believe that the issue is that the table is locked? Nothing in your description appears to require a table lock. I would wager that you have multiple sessions trying to update the same row and that you have sessions blocking on row-level locks. Depending on how you have written the process, it is possible that you have a process that is taking a row-level lock on every row in a table by doing a `select ... for update`. I am hard-pressed to envision how a table lock would ever be involved here. — Justin Cave, Mar 16 '16 at 18:36
You are right this could be row level lock. I only assumed it as table lock because we could not update any other row in that table. However it is possible all the rows were locked. I am interested to see if there are patterns that I can use to do this kind of batch processing. — Satish Rao, Mar 16 '16 at 19:23
The basic pattern would be to ensure that each thread is working on a separate set of rows and that you don't have two threads trying to update the same set of rows. Exactly how you accomplish that is highly implementation-specific. Often, there are natural methods of partitioning data (different business units, product lines, time periods, etc.) but you'd need to figure out the most reasonable way of ensuring that two threads aren't modifying the same data. If you move the logic into the database, you could use the `dbms_parallel_execute` package to run multiple threads. — Justin Cave, Mar 16 '16 at 19:28
Unfortunately, I cannot move the logic to the database. It has to be done in the code. you said "Exactly how you accomplish that is highly implementation-specific", however if there are best practices on doing it where the implementation is prescribed, I would like to know about it. We may not be able to implement it as-is, but get better ideas of other possible solutions out there. — Satish Rao, Mar 16 '16 at 21:19
Look at your data. Figure out a way to partition the data in such a way that two threads aren't processing the same row. That could be by finding some attribute of the data that makes partitioning easy (i.e. one thread processes orders for one state while the other processes orders for a different state or one thread processes January orders while another does February). Or implement some synthetic approach that ensures that the two sets are distinct (i.e. break up the parent table by `rowid`). How you implement the division depends heavily on the application. — Justin Cave, Mar 16 '16 at 22:05

Relational database batch processing rows

0 Answers0