this is my first question here. I am working on a spring batch and I am using step partitioning for processing 70K records. For testing I am using 1021 records and found that the partitioning not happening equally for each thread. I am using JDBCPagingItemReader with 5 threads. The distribution should be
Thread 1 - 205
Thread 2 - 205
Thread 3 - 205
Thread 4 - 205
Thread 5 - 201
But unfortunately this is not happening and I am getting the below record distribution among threads
Thread 1 - 100
Thread 2 - 111
Thread 3 - 100
Thread 4 - 205
Thread 5 - 200
Total 716 records and 305 records are skipped while partitioning. I really don't have any clue what is happening. Could you please look at the below configurations and let me know am I missing anything? Thanks in advance for your help.
<import resource="../config/batch-context.xml" />
<import resource="../config/database.xml" />
<job id="partitionJob" xmlns="http://www.springframework.org/schema/batch">
<step id="masterStep" parent="abstractPartitionerStagedStep">
<partition step="slave" partitioner="rangePartitioner">
<handler grid-size="5" task-executor="taskExecutor"/>
</partition>
</step>
</job>
<bean id="abstractPartitionerStagedStep" abstract="true">
<property name="listeners">
<list>
<ref bean="updatelistener" />
</list>
</property>
</bean>
<bean id="updatelistener"
class="com.test.springbatch.model.UpdateFileCopyStatus" >
</bean>
<!-- Jobs to run -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<tasklet>
<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="1" retry-limit="0" skip-limit="100">
<skippable-exception-classes>
<include class="java.lang.Exception"/>
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
<bean id="rangePartitioner" class="com.test.springbatch.partition.RangePartitioner">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="5"/>
<property name="maxPoolSize" value="5"/>
<property name="queueCapacity" value="100" />
<property name="allowCoreThreadTimeOut" value="true"/>
<property name="keepAliveSeconds" value="60" />
</bean>
<bean id="itemProcessor" class="com.test.springbatch.processor.CaseProcessor" scope="step">
<property name="threadName" value="#{stepExecutionContext[name]}" />
</bean>
<bean id="pagingItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader"
scope="step">
<property name="dataSource" ref="dataSource" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="selectClause" value="SELECT *" />
<property name="fromClause" value="FROM ( SELECT CASE_NUM ,CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD,
SBMT_OFC_NUM,DSTR_CHNL_CD,APRV_OFC_CD,APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER()
OVER(ORDER BY CASE_NUM) AS rownumber FROM TSMCASE WHERE PROC_IND ='N' ) AS data" />
<property name="whereClause" value="WHERE rownumber BETWEEN :fromRow AND :toRow " />
<property name="sortKey" value="CASE_NUM" />
</bean>
</property>
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<map>
<entry key="fromRow" value="#{stepExecutionContext[fromRow]}" />
<entry key="toRow" value="#{stepExecutionContext[toRow]}" />
</map>
</property>
<property name="pageSize" value="100" />
<property name="rowMapper">
<bean class="com.test.springbatch.model.CaseRowMapper" />
</property>
</bean>
<bean id="flatFileItemWriter" class="com.test.springbatch.writer.FNWriter" scope="step" >
</bean>
Here the partitioner code
public class OffRangePartitioner implements Partitioner {
private String officeLst;
private double splitvalue;
private DataSource dataSource;
private static Logger LOGGER = Log4JFactory.getLogger(OffRangePartitioner.class);
private static final int INDENT_LEVEL = 6;
public String getOfficeLst() {
return officeLst;
}
public void setOfficeLst(final String officeLst) {
this.officeLst = officeLst;
}
public void setDataSource(DataSource dataSource) {
this.dataSource = dataSource;
}
public OfficeRangePartitioner() {
super();
final GlobalProperties globalProperties = GlobalProperties.getInstance();
splitvalue = Double.parseDouble(globalProperties.getProperty("springbatch.part.splitvalue"));
}
@Override
public Map<String, ExecutionContext> partition(int threadSize) {
FormattedTraceHelper.formattedTrace(LOGGER,"Partition method in OffRangePartitioner class Start",INDENT_LEVEL, Level.INFO_INT);
final Session currentSession = HibernateUtil.getSessionFactory(HibernateConstants.DB2_DATABASE_NAME).getCurrentSession();
Query queryObj;
double count = 0.0;
final Transaction transaction = currentSession.beginTransaction();
queryObj = currentSession.createQuery(BatchConstants.PARTITION_CNT_QRY);
if (queryObj.iterate().hasNext()) {
count = Double.parseDouble(queryObj.iterate().next().toString());
}
int fromRow = 0;
int toRow = 0;
ExecutionContext context;
FormattedTraceHelper.formattedTrace(LOGGER,"Count of total records submitted for processing >> " + count, INDENT_LEVEL, Level.DEBUG_INT);
int gridSize = (int) Math.ceil(count / splitvalue);
FormattedTraceHelper.formattedTrace(LOGGER,"Total Grid size based on the count >> " + gridSize, INDENT_LEVEL, Level.DEBUG_INT);
Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();
for (int threadCount = 1; threadCount <= gridSize; threadCount++) {
fromRow = toRow + 1;
if (threadCount == gridSize || gridSize == 1) {
toRow = (int) count;
} else {
toRow += splitvalue;
}
context = new ExecutionContext();
context.putInt("fromRow", fromRow);
context.putInt("toRow", toRow);
context.putString("name", "Processing Thread" + threadCount);
result.put("partition" + threadCount, context);
FormattedTraceHelper.formattedTrace(LOGGER, "Partition number >> "
+ threadCount + " from Row#: " + fromRow + " to Row#: "
+ toRow, INDENT_LEVEL, Level.DEBUG_INT);
}
if (transaction != null) {
transaction.commit();
}
FormattedTraceHelper.formattedTrace(LOGGER,
"Partition method in OffRangePartitioner class End",
INDENT_LEVEL, Level.INFO_INT);
return result;
}
}
Today, I have tested the same batch with 1056 records with Spring Framework log debug on.
PAGE SIZE 100
SELECT * FROM (
SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD,
APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER() OVER(ORDER BY CASE_NUM) AS rownumber
FROM TCASE
WHERE **SECARCH_PROC_IND ='P'**
) AS data
WHERE
rownumber BETWEEN :fromRow AND :toRow
ORDER BY
rownumber ASC
FETCH FIRST 100 ROWS ONLY
We are updating the SECARCH_PROC_IND ='P' flag to 'C' once each record processed. We are using ROWNUM in the main query to partition the records based on SECARCH_PROC_IND ='P' and the ROWNUM getting changed once the SECARCH_PROC_IND ='P' flag updated to 'C'by any threads.
Looks like this is the issue.