0

Spring Batch framework defines several BATCH_ prefixed tables.

They leave index definitions to end-users and curtain queries can't avoid full scan by design. So performance requires slimming of tables.

We don't need to keep historical BATCH_* data for more than 1 week.

I can't come up with rejection strategy for Postgres that doesn't require all our batch processes stopping.

If it is easy to stop Batch jobs then I can truncate or drop BATCH_* tables. This require coordination between DB maintenance and Batch job maintenance.

I think about delete data based on BATCH_JOB_EXECUTION.CREATE_TIME < current_date - 7 with corresponding joins. Special care should be taken to Postgres to reclaim used memory via vacuum. As I understand it is impossible to reclaim table storage without vacuum full but full locks BATCH_ tables. This blocks Batch processes...

UPDATE My current cleanup plan is (with performance stats in units + seconds):

-- 2.1M  43s
-- Quick cleanup of majority of records.
DELETE FROM batch_step_execution_context bsec
WHERE
  EXISTS (
    SELECT 1 FROM batch_step_execution bse
    WHERE bse.start_time < current_date - 22 and bsec.step_execution_id = bse.step_execution_id);
-- 2.5s
vacuum batch_step_execution_context;

-- 2.1M  40s
-- Quick cleanup of majority of records.
DELETE FROM batch_step_execution bse
WHERE bse.start_time < current_date - 22;
-- 59s
vacuum batch_step_execution;

-- 0  1.4s
-- Full cleanup.
DELETE FROM batch_step_execution_context bsec
WHERE
  EXISTS (
    SELECT 1 FROM batch_step_execution bse
    join batch_job_execution bje on bje.job_execution_id = bse.job_execution_id
    WHERE bje.start_time < current_date - 22 and bsec.step_execution_id = bse.step_execution_id);

-- 0  1.2s
-- Full cleanup.
DELETE FROM batch_step_execution bse
WHERE
  EXISTS (
    SELECT 1 FROM batch_job_execution bje
    WHERE bje.start_time < current_date - 22 and bje.job_execution_id = bse.job_execution_id);

-- 122k  .49s
DELETE FROM batch_job_execution_params bjep
WHERE
  EXISTS (
    SELECT 1 FROM batch_job_execution bje
    WHERE bje.start_time < current_date - 22 and bje.job_execution_id = bjep.job_execution_id);
-- 1.2s
vacuum batch_job_execution_params;

-- 61k  .31s
DELETE FROM batch_job_execution_context bjec
WHERE
  EXISTS (
    SELECT 1 FROM batch_job_execution bje
    WHERE bje.start_time < current_date - 22 and bje.job_execution_id = bjec.job_execution_id);
-- .68s
vacuum batch_job_execution_context;

-- 61k  4.4s
DELETE FROM batch_job_execution bje
WHERE bje.start_time < current_date - 22;
-- .21s
vacuum batch_job_execution;

-- 61k  1.1s
DELETE FROM batch_job_instance bji
WHERE NOT EXISTS (SELECT 1 FROM batch_job_execution bje WHERE bje.job_instance_id = bji.job_instance_id);
-- .33s
vacuum batch_job_instance;
gavenkoa
  • 45,285
  • 19
  • 251
  • 303
  • Possible duplicate of https://stackoverflow.com/questions/47233217/spring-batch-tables-purging. See https://jira.spring.io/browse/BATCH-1747?focusedCommentId=66226&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-66226 – Mahmoud Ben Hassine Jan 23 '19 at 08:55

1 Answers1

1

may be duplicate .... posting the same answer again , hope this helps

i have been struggling with this for a ling time but, there is no standard implementation for this.

Then i came up with a my own stored procedure ,

I have created my own variable - for clearing last 6 months data AGO_SIX_MONTH_DATE

You can use your own value.

The solution is at below link -

Spring Batch Meta-Data tables Purging

Ashish Shetkar
  • 1,414
  • 2
  • 18
  • 35
  • I omitted `select distinct JOB_INSTANCE_ID bulk collect into` and use `not exists` instead to cleanup `batch_job_instance`. – gavenkoa Jan 28 '19 at 13:48
  • You haven't mentioned DB flavor. I suspect it is MSSQL. My queries differ in minor details. Like I doesn't join to `BATCH_JOB_EXECUTION` when delete based on `BATCH_STEP_EXECUTION.START_TIME` though I do that later. It is because `BATCH_STEP_EXECUTION` is the most bloated table in our use. – gavenkoa Jan 28 '19 at 13:49
  • 1
    well the DB here is Oracle , mostly you will find similar structure of batch tables across different databases , to clean up these tables , i have followed the meta data model provided by Spring , you can check the model diagram here - https://docs.spring.io/spring-batch/trunk/reference/html/metaDataSchema.html – Ashish Shetkar Jan 28 '19 at 14:04