Spring Batch SQL Merge Takes Too Long

Question

I am trying to migrate data from an Oracle to SQLServer database. I'm currently using a JDBCCursorItemReader to read the Oracle data, then a JDBCBatchItemWriter to write the data to SQLserver.

My problem is that this is taking way too long. For a table of ~200,000 rows, it takes almost an hour (and I have to run four of these queries, each around 200K of rows).

@Bean
public JdbcCursorItemReader<DataPOJO> dataReader() throws Exception, ParseException, UnexpectedInputException {
        final JdbcCursorItemReader<DataPOJO> dataReader= new JdbcCursorItemReader<>();

        dataReader.setDataSource(oracleDataSource);
        dataReader.setSql(Constants.DATA_QUERY);
        dataReader.setRowMapper(new BeanPropertyRowMapper<DataPOJO>(DataPOJO.class));
        return dataReader;
    }

@Bean
    public JdbcBatchItemWriter<DataPOJO> dataWriter() throws UnexpectedInputException, ParseException, Exception {

        JdbcBatchItemWriter<DataPOJO> dataWriter = new JdbcBatchItemWriter<>();
        dataWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<DataPOJO>());
        dataWriter.setSql(Constants.DATA_MERGE);
        dataWriter.setDataSource(mssDataSource);
        return dataWriter;
    }

Does anyone have any tips for improving performance on these kinds of jobs?

That looks definitely like a db problem. How long does it take until the Select-Query returns if you run it directly in a sql/db browser? Furthermore, have you set autocommit to false? What is the commitSize oft your step? Generally, you should be able to read and write a couple of thousands of entries per second. — Hansjoerg Wingeier, Sep 19 '16 at 08:04

TheGameiswar · Answer 1 · 2016-09-16T16:21:08.003

Does anyone have any tips for improving performance on these kinds of jobs?

I would approach in this way..

1.See the wait stats for the process i am running

select * from sys.dm_exec_requests where session_id=<< your session id>>

To identify your sessionid,you can check sys.processes

select spid,waittime,lastwaittype,status
hostname,     --below three columns help you identify your query/spid,
 program_name,--- you can replace spid in above query to get live status
nt_username
 from sys.sysprocesses

2.Troubleshoot based on wait type..

score 0 · Answer 2 · answered Sep 17 '16 at 14:37

I would first insert your records to a source table (personally, I'd use a temp table) and then leverage the MERGE command to push the records from that "source" table into your "target" table.

A couple key performance concerns:

Make sure you match on an indexed key on your target table
Assuming you want to update MATCHED rows, do NOT update the index column(s) as part of your UPDATE statement
Set your commit interval decently high. Your DB should be able to merge 1000 rows without breaking a sweat. Committing too frequently will add significant overhead.

Now to accomplish this with Spring Batch, you'll probably need a composite ItemWriter (one comes out of the box with the framework). Delegate writer 1 would be the JdbcBatchItemWriter you're already using and would insert to that source table. Delegate writer 2 would be custom and would simply execute your MERGE command.

Spring Batch SQL Merge Takes Too Long

2 Answers2