1

I am trying to create a spring batch application using annotation based approach with partitioner, which will be triggered by quartz scheduler, but getting following issues.

  1. When the job is triggered each partition is executed sequentially instead of parallelly i.e if I have 10 partitions instead of all 10 getting triggered/processed together it process one by one.

  2. When more than one instance of job(this is needed as per my requirement) gets triggered it's not getting synchronized properly i.e when 2nd instace is started it uses 1st instance data and 1st instance will stop processing but will be active.

Following are my configuration/class files.

BatchConfiguration.java -

@Configuration
@EnableBatchProcessing
public class BatchConfiguration
{
@Autowired
private JobBuilderFactory jobBuilders;

@Autowired
private StepBuilderFactory stepBuilders;

@Bean
@StepScope
public JdbcCursorItemReader reader(@Value("#{stepExecutionContext[someParam]}") String someParam) {
    JdbcCursorItemReader jdbcCursorItemReader = new JdbcCursorItemReader();
    jdbcCursorItemReader.setDataSource(getDataSource());
    jdbcCursorItemReader.setSql("myQuery");
    jdbcCursorItemReader.setRowMapper(new NotifRowMapper());
    return jdbcCursorItemReader;}

@Bean
@StepScope
public MyProcessor processor() {
    return new MyProcessor();}

@Bean
public MyPartitioner partitioner() {
    MyPartitioner partitioner = new MyPartitioner();
    partitioner.setDataSource(getDataSource());
    partitioner.setSql("MyPartitionerQuery");
    return partitioner;}

@Bean
@StepScope
public JdbcBatchItemWriter writer(DataSource dataSource) {
    JdbcBatchItemWriter writer = new JdbcBatchItemWriter();
    writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider());
    writer.setSql("MyWriterQuery");
    writer.setDataSource(dataSource);
    return writer;}

@Bean
public Job springBatch() {
    return jobBuilders.get("springBatch").start(masterStep()).build();}

@Bean
public Step masterStep() {
    return stepBuilders.get("masterStep")
    .partitioner(slave(reader(null), writer(getDataSource()),processor()))
    .partitioner("slave", partitioner())
    .taskExecutor(taskExecutor()).build();}

@Bean
public Step slave(JdbcCursorItemReader reader,JdbcBatchItemWriter writer, MyProcessor processor) {
    return stepBuilders.get("slave")
    .chunk(100).reader(reader).processor(processor).writer(writer).build();}

@Bean
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
    taskExecutor.setMaxPoolSize(20);
    taskExecutor.afterPropertiesSet();
    return taskExecutor;}

@Bean
public JdbcTemplate jdbcTemplate(DataSource dataSource) {
    return new JdbcTemplate(dataSource);}

@Bean
public DataSource getDataSource() {
    return dataSource;
}

@Bean
public JobRepository getJobRepository() throws Exception {
    MapJobRepositoryFactoryBean factory = new MapJobRepositoryFactoryBean();
    factory.setTransactionManager(new ResourcelessTransactionManager());
    factory.afterPropertiesSet();
    factory.setIsolationLevelForCreate("ISOLATION_READ_COMMITTED");
    return (JobRepository) factory.getObject();
}}

QuartzJob.java(Triggers spring batch job) -

public class QuartzJob implements org.quartz.Job
{
@Override
public void execute(org.quartz.JobExecutionContext jobExecutionContext) throws org.quartz.JobExecutionException
{
    AnnotationConfigApplicationContext context;
    try
    {
        context = new AnnotationConfigApplicationContext(BatchConfiguration.class);
        JobLauncher jobLauncher = context.getBean(JobLauncher.class);
        org.springframework.batch.core.Job newJob = context.getBean("springBatch", org.springframework.batch.core.Job.class);
        JobParameters param = new JobParametersBuilder().addLong("time",System.currentTimeMillis()).toJobParameters();
        jobLauncher.run(newJob, param);

    } catch (Exception e){}}}

MyQuartzListener.java(Class triggers quartz job during server start up)-

public class MyQuartzListener implements ServletContextListener
{
private Scheduler scheduler;

@Override
public void contextDestroyed(ServletContextEvent arg0){ }

@Override
public void contextInitialized(ServletContextEvent ctx)
{
    JobDetail job = JobBuilder.newJob(QuartzJob.class).withIdentity("SPRINGBATCH", "SPRINGBATCH").build();

    Trigger trigger = TriggerBuilder.newTrigger().withIdentity("SPRINGBATCH", "SPRINGBATCH").startNow().withSchedule(SimpleScheduleBuilder.simpleSchedule().withIntervalInSeconds(60).repeatForever()).build();

    try
    {
        scheduler = ((StdSchedulerFactory) ctx.getServletContext().getAttribute(QuartzInitializerListener.QUARTZ_FACTORY_KEY)).getScheduler();
        job.getJobDataMap().put("quartztime", System.currentTimeMillis());
        scheduler.scheduleJob(job, trigger);
    } catch (SchedulerException e) {} }

}

MyPartitioner .java

public class MyPartitioner implements Partitioner
{
@Override
public Map<String, ExecutionContext> partition(int gridSize)
{
    Map<String, ExecutionContext> partitionMap = new HashMap<String, ExecutionContext>();
    List<String> partitionCodes = getPartitionCodes(sql);
    int count = 1;

    for (String partitionCode : partitionCodes)
    {
        ExecutionContext context = new ExecutionContext();
        context.put("partitionCode", partitionCode);
        context.put("name", "Thread" + count);
        partitionMap.put(partitionCode, context);
        count++;
    }
    return partitionMap;}}

Is there something wrong with this configuration? I am passing current time to each instance of job to identify each instance seperately, but still it's not working.

springenthusiast
  • 403
  • 1
  • 8
  • 30
  • For question 1, I don't see anything obviously wrong. That being said, I just looked and I don't see any unit test for validating this so it may be a bug in the builder. Further research is required. For question 2, I'd need to see more about your queries. How are you preventing the two job instances from picking up the same data? – Michael Minella May 14 '15 at 14:16
  • Thanks for the reply. I couldn't get what you mean bug in the builder? For question 2, I thought of updating those records which are picked to something like "picked for processing" in DB , so it won't be fetched by another instance. Is this correct way or is there any alternative approach I can use? My records are grouped based on "partitionCode" column in DB. – springenthusiast May 15 '15 at 05:33
  • Yes, the process indicator pattern is a common batch pattern where a job will mark the records it's about to process so that they aren't picked up on restart or by other job instances. Typically you'll use a listener (step, etc) to mark them as in process and then another listener to mark them as having been processed. – Michael Minella May 15 '15 at 14:33
  • Thanks, but I am still facing the above mentioned 2 issues :( . And also how do I integrate quartz scheduler with this application as I couldn't find any example/document related to annotation based quartz+spring batch ? – springenthusiast May 18 '15 at 09:38

0 Answers0