Part 1 :
I need to develop a job using spring batch which will read data from a csv file and write to oracle database. I need to implement multithreading/parallel processing to process faster as the records are expected to be in millions.
Question 1:
Is it suitable to use multithreading (task - executor) for this purpose or partitioning (partitioner)? which will serve the purpose better?
Part 2 :
I am trying using Partitioner. I need to skip the records causing insertion failure and print them to logs. A listener implementing skip listener prints those. But the issue i am facing with partitioner is that my listener method is getting invoked by each thread for each skipped record. eg. 4 threads and 4 records skipped so the console is printing 4*4 = 16
records instead for just 4 skipped records.
listener print statement:
@OnSkipInWrite
public void logWrite(Report item, Throwable t) {
count++;
System.out.println("record skipped before writing " +count +" : " +item.toString());
}
Job xml code for partitioner:
<batch:step id="step1">
<batch:partition step = "partitionReadWrite" partitioner= "rangePartitioner">
<batch:handler grid-size = "4" task-executor = "task-executor"/>
</batch:partition>
</batch:step>
</batch:job>
<batch:step id = "partitionReadWrite" >
<batch:tasklet>
<batch:chunk reader="cvsFileItemReader" writer="mysqlItemWriter"
commit-interval="10" skip-limit="50" >
<batch:skippable-exception-classes>
<batch:include class = "java.sql.SQLException"/>
<batch:include class = "org.springframework.dao.DataAccessException" />
</batch:skippable-exception-classes>
</batch:chunk>
<batch:listeners>
<batch:listener ref="orderSkipListener" />
</batch:listeners>
</batch:tasklet>
</batch:step>
<bean id="task-executor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="5" />
<property name="maxPoolSize" value="10" />
<property name="allowCoreThreadTimeOut" value="true" />
</bean>
reader and writer :
<bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope = "step">
<property name = "linesToSkip" value = "1"/>
<!-- Read a csv file -->
<property name="resource" value="classpath:cvs/report.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<!-- split it -->
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="date,impressions,clicks,earning" />
<property name = "includedFields" value = "0,1,2,3" />
</bean>
</property>
<property name="fieldSetMapper">
<!-- return back to reader, rather than a mapped object. -->
<!-- <bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" /> -->
<!-- map to an object -->
<bean
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="report" />
</bean>
</property>
</bean>
</property>
</bean>
Writer
I am using is JdbcBatchItemWriter.
Any thread safe Writer to be used?
<bean id="mysqlItemWriter"
class="org.springframework.batch.item.database.JdbcBatchItemWriter" scope = "step">
<property name="dataSource" ref="dataSource" />
<property name="sql">
<value = "{insertquery}/>
</property>
<!-- It will take care matching between object property and sql name parameter -->
<property name="itemSqlParameterSourceProvider">
<bean
class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
</property>
</bean>
Question 2:
How can use skip to handle failure? is there any other way i can prevent database insertion failure to prevent wole chunk to get failed?