0

Part 1 :
I need to develop a job using spring batch which will read data from a csv file and write to oracle database. I need to implement multithreading/parallel processing to process faster as the records are expected to be in millions.
Question 1:
Is it suitable to use multithreading (task - executor) for this purpose or partitioning (partitioner)? which will serve the purpose better?

Part 2 :
I am trying using Partitioner. I need to skip the records causing insertion failure and print them to logs. A listener implementing skip listener prints those. But the issue i am facing with partitioner is that my listener method is getting invoked by each thread for each skipped record. eg. 4 threads and 4 records skipped so the console is printing 4*4 = 16 records instead for just 4 skipped records.
listener print statement:

@OnSkipInWrite
    public void logWrite(Report item, Throwable t) {
        count++;
        System.out.println("record skipped before writing " +count +" : " +item.toString());
    }


Job xml code for partitioner:

<batch:step id="step1">
            <batch:partition step = "partitionReadWrite" partitioner= "rangePartitioner">
                <batch:handler grid-size = "4" task-executor = "task-executor"/>
            </batch:partition>

        </batch:step>
        </batch:job>

            <batch:step id = "partitionReadWrite" >
            <batch:tasklet>
                <batch:chunk reader="cvsFileItemReader" writer="mysqlItemWriter"
                    commit-interval="10" skip-limit="50" >
                    <batch:skippable-exception-classes>

                        <batch:include class = "java.sql.SQLException"/>
                         <batch:include class = "org.springframework.dao.DataAccessException" /> 
                    </batch:skippable-exception-classes>
                </batch:chunk>
                  <batch:listeners>  
                          <batch:listener ref="orderSkipListener" />  
                     </batch:listeners> 
            </batch:tasklet>
            </batch:step>
            <bean id="task-executor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
    <property name="corePoolSize" value="5" />
    <property name="maxPoolSize" value="10" />
    <property name="allowCoreThreadTimeOut" value="true" />
</bean>

reader and writer :

  <bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope = "step">

    <property name = "linesToSkip" value = "1"/>
            <!-- Read a csv file -->
            <property name="resource" value="classpath:cvs/report.csv" />

            <property name="lineMapper">
                <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">

                    <!-- split it -->
                    <property name="lineTokenizer">
                        <bean
                            class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                              <property name="names" value="date,impressions,clicks,earning" />  
                            <property name = "includedFields" value = "0,1,2,3" />
                        </bean>
                    </property>

                    <property name="fieldSetMapper">

                        <!-- return back to reader, rather than a mapped object. -->

                            <!-- <bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" /> -->   
                        <!-- map to an object -->
                         <bean
                            class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
                            <property name="prototypeBeanName" value="report" />
                        </bean>

                    </property>

                </bean>
            </property>

        </bean>


Writer
I am using is JdbcBatchItemWriter.
Any thread safe Writer to be used?

<bean id="mysqlItemWriter"
        class="org.springframework.batch.item.database.JdbcBatchItemWriter" scope = "step">
         <property name="dataSource" ref="dataSource" />        
        <property name="sql">
            <value = "{insertquery}/>

        </property>
        <!-- It will take care matching between object property and sql name parameter -->
        <property name="itemSqlParameterSourceProvider">
            <bean
                class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
        </property>
    </bean>


Question 2:
How can use skip to handle failure? is there any other way i can prevent database insertion failure to prevent wole chunk to get failed?

The iOSDev
  • 5,237
  • 7
  • 41
  • 78
hafs
  • 1
  • 2
  • can u show the code for your partitioner? Also try debugging your partitioner to see how the threads are accessing it like in https://www.mkyong.com/spring-batch/spring-batch-partitioning-example/ – The Guest Mar 09 '18 at 22:06
  • Thanks. Mkyong example is reading from a database with range values provided through stepExecutionContext in the select query but how can i pass my index range values for reading from a csv file through StepExecutionContext? which reader and property can help in this? – hafs Mar 10 '18 at 10:42
  • found kind of similar requirement . But how to implement readers for solution suggessted by @Nghia Do https://stackoverflow.com/questions/39092968/spring-batch-multi-thread-file-reading – hafs Mar 10 '18 at 17:36

0 Answers0