I have a requirement where a large amount of records need to be deleted from a table using pentaho ETL job. I know it can be easily implemented using delete step in a transformation, but i am afraid of using it as Delete step can lock the table for entire time period of delete operation. Since records are huge and deleting records can take long time, other queries (Which can try to access same table) will be blocked for that time-period. I am not sure if my suspicion is correct or not. If it's correct, is there any way to write a job which deletes records in chunk?
Asked
Active
Viewed 654 times
1 Answers
1
It is database dependent, not Kettle dependent. So you've better to either:
a) Truncate the table, instead of delete. A truncate is usually quick but erase all the records.
b) Use a job with a first transformation to delete the records and a second transformation to process the data.
As it not usual to delete from a table while feeding it at the same time, I guess you are making a selective delete (with a WHERE clause). So option (a) is not a solution, while option (b) gives you a fine control on which parts of the table you are working with in order to avoid you walking on your own shoes.

AlainD
- 6,187
- 3
- 17
- 31
-
Yes, it's selective query. As far as option b is concerned, it cannot provide the right solution to the problem. The same table is actually used by two processes simultaneously where one process is kettle job used to delete the records and another process is used to read the table for other processing. – Krishna Gond Jan 05 '18 at 11:31