I have huge list of reports loaded into chunk partition step. Each reports will be further processed to generate individual report. But if I load 50k of reports in the partition step, which overloads the server and it gets much slow. Instead of that I would prefer, partition step to load 3k of report list, process it and then load another 3k reports on parition step.. continue the same until 50k reports get processed.
<step id="genReport" next="fileTransfer">
<chunk item-count="1000">
<reader ref="Reader" >
</reader>
<writer
ref="Writer" >
</writer>
</chunk>
<partition>
<mapper ref="Mapper">
<properties >
<property name="threadCount" value="#{jobProperties['threadCount']}"/>
<property name="threadNumber" value="#{partitionPlan['threadNumber']}"/>
</properties>
</mapper>
</partition>
</step>
public PartitionPlan mapPartitions() {
PartitionPlanImpl partitionPlan = new PartitionPlanImpl();
int numberOfPartitions = //dao call to load the reports count
partitionPlan.setThreads(getThreadCount());
partitionPlan.setPartitions(numberOfPartitions); //This numberOfPartitions is comes from the database, huge size like 20k to 40k
Properties[] props = new Properties[numberOfPartitions];
for (int idx = 0; idx < numberOfPartitions; idx++) {
Properties threadProperties = new Properties();
threadProperties.setProperty("threadNumber", idx + "");
GAHReportListData gahRptListData = gahReportListManager.getPageToProcess(); //Data pulled from PriorityBlockingQueue
String dynSqlId = gahRptListData.getDynSqlId();
threadProperties.setProperty("sqlId", dynSqlId);
threadProperties.setProperty("outFile", fileName);
props[idx] = threadProperties;
}
partitionPlan.setPartitionProperties(props);
return partitionPlan;
}
Once 3k reports of data processed by partition mapper, Then it has to check for the next available list. If its available the partition should be reset with next set of 3k reports to process.