1

this is my first question here. I am working on a spring batch and I am using step partitioning for processing 70K records. For testing I am using 1021 records and found that the partitioning not happening equally for each thread. I am using JDBCPagingItemReader with 5 threads. The distribution should be

Thread 1 - 205

Thread 2 - 205

Thread 3 - 205

Thread 4 - 205

Thread 5 - 201

But unfortunately this is not happening and I am getting the below record distribution among threads

Thread 1 - 100

Thread 2 - 111

Thread 3 - 100

Thread 4 - 205

Thread 5 - 200

Total 716 records and 305 records are skipped while partitioning. I really don't have any clue what is happening. Could you please look at the below configurations and let me know am I missing anything? Thanks in advance for your help.

<import resource="../config/batch-context.xml" />
<import resource="../config/database.xml" />

<job id="partitionJob"  xmlns="http://www.springframework.org/schema/batch">

    <step id="masterStep" parent="abstractPartitionerStagedStep">

        <partition step="slave" partitioner="rangePartitioner">
            <handler grid-size="5" task-executor="taskExecutor"/>
        </partition>

    </step>

</job>
<bean id="abstractPartitionerStagedStep" abstract="true">
    <property name="listeners">
        <list>
            <ref bean="updatelistener" />
        </list>
    </property>

</bean>
<bean id="updatelistener" 
      class="com.test.springbatch.model.UpdateFileCopyStatus" >
</bean>
<!-- Jobs to run -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
    <tasklet>
        <chunk reader="pagingItemReader" writer="flatFileItemWriter"
            processor="itemProcessor" commit-interval="1" retry-limit="0" skip-limit="100">
        <skippable-exception-classes>
            <include class="java.lang.Exception"/>
        </skippable-exception-classes>
        </chunk>    
    </tasklet>
</step>

<bean id="rangePartitioner" class="com.test.springbatch.partition.RangePartitioner"> 
    <property name="dataSource" ref="dataSource" />
</bean>

<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="5"/>
<property name="maxPoolSize" value="5"/>
<property name="queueCapacity" value="100" />
<property name="allowCoreThreadTimeOut" value="true"/>
<property name="keepAliveSeconds" value="60" /> 
</bean>

<bean id="itemProcessor" class="com.test.springbatch.processor.CaseProcessor" scope="step">
    <property name="threadName" value="#{stepExecutionContext[name]}" />
</bean>

<bean id="pagingItemReader"
    class="org.springframework.batch.item.database.JdbcPagingItemReader"
    scope="step">
    <property name="dataSource" ref="dataSource" />
    <property name="queryProvider">
        <bean
            class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
            <property name="dataSource" ref="dataSource" />
            <property name="selectClause" value="SELECT *" />
            <property name="fromClause" value="FROM ( SELECT CASE_NUM ,CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD,
                        SBMT_OFC_NUM,DSTR_CHNL_CD,APRV_OFC_CD,APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER() 
                        OVER(ORDER BY CASE_NUM) AS rownumber FROM TSMCASE WHERE PROC_IND ='N' ) AS data" />
            <property name="whereClause" value="WHERE rownumber BETWEEN :fromRow AND :toRow " />
            <property name="sortKey" value="CASE_NUM" />
        </bean>
    </property>
    <!--  Inject via the ExecutionContext in rangePartitioner  -->
    <property name="parameterValues">
        <map>
            <entry key="fromRow" value="#{stepExecutionContext[fromRow]}" />
            <entry key="toRow" value="#{stepExecutionContext[toRow]}" />
        </map>
    </property>
    <property name="pageSize" value="100" /> 
    <property name="rowMapper">
        <bean class="com.test.springbatch.model.CaseRowMapper" />
    </property>
</bean>

<bean id="flatFileItemWriter" class="com.test.springbatch.writer.FNWriter" scope="step" >
</bean>

Here the partitioner code

public class OffRangePartitioner implements Partitioner {

private String officeLst;
private double splitvalue;
private DataSource dataSource;
private static Logger LOGGER = Log4JFactory.getLogger(OffRangePartitioner.class);
private static final int INDENT_LEVEL = 6;

public String getOfficeLst() {
    return officeLst;
}

public void setOfficeLst(final String officeLst) {
    this.officeLst = officeLst;
}

public void setDataSource(DataSource dataSource) {
    this.dataSource = dataSource;
}

public OfficeRangePartitioner() {
    super();
    final GlobalProperties globalProperties = GlobalProperties.getInstance();
    splitvalue = Double.parseDouble(globalProperties.getProperty("springbatch.part.splitvalue"));
}

@Override
public Map<String, ExecutionContext> partition(int threadSize) {
    FormattedTraceHelper.formattedTrace(LOGGER,"Partition method in OffRangePartitioner class Start",INDENT_LEVEL, Level.INFO_INT);
    final Session currentSession = HibernateUtil.getSessionFactory(HibernateConstants.DB2_DATABASE_NAME).getCurrentSession();

    Query queryObj;
    double count = 0.0;

    final Transaction transaction = currentSession.beginTransaction();
    queryObj = currentSession.createQuery(BatchConstants.PARTITION_CNT_QRY);

    if (queryObj.iterate().hasNext()) {
        count = Double.parseDouble(queryObj.iterate().next().toString());
    }

    int fromRow = 0;
    int toRow = 0;
    ExecutionContext context;

    FormattedTraceHelper.formattedTrace(LOGGER,"Count of total records submitted for processing >> " + count, INDENT_LEVEL, Level.DEBUG_INT);
    int gridSize = (int) Math.ceil(count / splitvalue);
    FormattedTraceHelper.formattedTrace(LOGGER,"Total Grid size based on the count >> " + gridSize, INDENT_LEVEL, Level.DEBUG_INT);
    Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();

    for (int threadCount = 1; threadCount <= gridSize; threadCount++) {
        fromRow = toRow + 1;
        if (threadCount == gridSize || gridSize == 1) {
            toRow = (int) count;
        } else {
            toRow += splitvalue;
        }
        context = new ExecutionContext();
        context.putInt("fromRow", fromRow);
        context.putInt("toRow", toRow);
        context.putString("name", "Processing Thread" + threadCount);
        result.put("partition" + threadCount, context);
        FormattedTraceHelper.formattedTrace(LOGGER, "Partition number >> "
                + threadCount + " from Row#: " + fromRow + " to Row#: "
                + toRow, INDENT_LEVEL, Level.DEBUG_INT);

    }
    if (transaction != null) {
        transaction.commit();
    }
    FormattedTraceHelper.formattedTrace(LOGGER,
            "Partition method in OffRangePartitioner class End",
            INDENT_LEVEL, Level.INFO_INT);
    return result;
}

}

Today, I have tested the same batch with 1056 records with Spring Framework log debug on.

PAGE SIZE 100

SELECT * FROM (
        SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD, 
            APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER() OVER(ORDER BY CASE_NUM) AS rownumber 
        FROM TCASE 
        WHERE **SECARCH_PROC_IND ='P'**
    ) AS data 
WHERE 
    rownumber BETWEEN :fromRow AND :toRow 
ORDER BY 
    rownumber ASC 
FETCH FIRST 100 ROWS ONLY

We are updating the SECARCH_PROC_IND ='P' flag to 'C' once each record processed. We are using ROWNUM in the main query to partition the records based on SECARCH_PROC_IND ='P' and the ROWNUM getting changed once the SECARCH_PROC_IND ='P' flag updated to 'C'by any threads.

Looks like this is the issue.

Sabir Khan
  • 9,826
  • 7
  • 45
  • 98
Anup Ghosh
  • 332
  • 2
  • 6
  • 23
  • See query in [this](http://stackoverflow.com/questions/43269325/partiton-rdbms-db2-table-data-either-by-sql-query-or-java) question – Sabir Khan Apr 17 '17 at 12:25

1 Answers1

0

Spring Batch fires below query to fetch the data from databse

SELECT * FROM ( SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD, APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, **ROW_NUMBER()** OVER(ORDER BY CASE_NUM) AS rownumber FROM TCASE WHERE **SECARCH_PROC_IND ='P'** ) AS data WHERE rownumber BETWEEN :fromRow AND :toRow ORDER BY rownumber ASC FETCH FIRST 100 ROWS ONLY

After processing each row the flag SECARCH_PROC_IND ='P' is updated to SECARCH_PROC_IND ='C'. As SECARCH_PROC_IND is used in WHERE clause and this is actually reducing the ROW_NUMBER in next sequence of queries fired by spring batch. This is the root cause of the issue.

We have introduced another column SECARCH_PROC_TMP_IND in the table which we are updating before batch processing with flag 'P' in beforeJob() method and we are using that column in WHERE clause of the query instead of using SECARCH_PROC_IND column.

Once batch processed, in afterJob() we are re-setting the SECARCH_PROC_TMP_IND to NULL.

This resolved the partition issue.

Sabir Khan
  • 9,826
  • 7
  • 45
  • 98
Anup Ghosh
  • 332
  • 2
  • 6
  • 23
  • Could you both please guide me here: https://stackoverflow.com/questions/62519493/partitions-and-jdbcpagingitemreader-doesnt-gives-correct-values ? – PAA Jun 23 '20 at 05:47