2

I am near wits-end. I read/googled endlessly so far and tried the solutions on all the google/stackoverflow posts that have this similiar issue (there a quite a few). Some seemed promising, but nothing has worked for me yet; though I have made some progress and I am on the right track I believe (I'm believing at this point its something with the Transaction manager and some possible conflict with Spring Batch vs. Spring Data JPA).

References:

  1. Spring boot repository does not save to the DB if called from scheduled job
  2. JpaItemWriter: no transaction is in progress

Similar to the aforementioned posts, I have a Spring Boot application that is using Spring Batch and Spring Data JPA. It reads comma delimited data from a .csv file, then does some processing/transformation, and attempts to persist/save to database using the JPA Repository methods, specifically here .saveAll() (I also tried .save() method and this did the same thing), since I'm saving a List<MyUserDefinedDataType> of a user-defined data type (batch insert).

Now, my code was working fine on Spring Boot starter 1.5.9.RELEASE, but I recently attempted to upgrade to 2.X.X, which I found, after countless hours of debugging, only version 2.2.0.RELEASE would persist/save data to database. So an upgrade to >= 2.2.1.RELEASE breaks persistence. Everything is read fine from the .csv, its just when the first time the code flow hits a JPA repository method like .save() .saveAll(), the application keeps running but nothing gets persisted. I also noticed the Hikari pool logs "active=1 idle=4", but when I looked at the same log when on version 1.5.9.RELEASE, it says active=0 idle=5 immediately after persisting the data, so the application is definitely hanging. I went into the debugger and even saw after jumping into the Repository calls, it goes into almost an infinite cycle through the Spring AOP libraries and such (all third party) and I don't believe ever comes back to the real application/business logic that I wrote.

3c22fb53ed64 2021-05-20 23:53:43.909 DEBUG
                    [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=5, active=1, idle=4, waiting=0)

Anyway, I tried the most common solutions that worked for other people which were:

  1. Defining a JpaTransactionManager @Bean and injecting it into the Step function, while keeping the JobRepository using the PlatformTransactionManager. This did not work. Then I also I tried using the JpaTransactionManager also in the JobRepository @Bean, this also did not work.
  2. Defining a @RestController endpoint in my application to manually trigger this Job, instead of doing it manually from my main Application.java class. (I talk about this more below). And per one of the posts I posted above, the data persisted correctly to the database even on spring >= 2.2.1, which further I suspect now something with the Spring Batch persistence/entity/transaction managers is messed up.

The code is basically this: BatchConfiguration.java

@Configuration
@EnableBatchProcessing
@Import({DatabaseConfiguration.class})
public class BatchConfiguration {

// Datasource is a Postgres DB defined in separate IntelliJ project that I add to my pom.xml
DataSource dataSource;

@Autowired
public BatchConfiguration(@Qualifier("dataSource") DataSource dataSource) {
    this.dataSource = dataSource;
}

@Bean
@Primary
public JpaTransactionManager jpaTransactionManager() {
    final JpaTransactionManager tm = new JpaTransactionManager();
    tm.setDataSource(dataSource);
    return tm;
}


 @Bean
 public JobRepository jobRepository(PlatformTransactionManager transactionManager) throws Exception {
    JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
    jobRepositoryFactoryBean.setDataSource(dataSource);
    jobRepositoryFactoryBean.setTransactionManager(transactionManager);
    jobRepositoryFactoryBean.setDatabaseType("POSTGRES");
    return jobRepositoryFactoryBean.getObject();
}

@Bean
public JobLauncher jobLauncher(JobRepository jobRepository) {
    SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
    simpleJobLauncher.setJobRepository(jobRepository);
    return simpleJobLauncher;
}

@Bean(name = "jobToLoadTheData")
 public Job jobToLoadTheData() {
    return jobBuilderFactory.get("jobToLoadTheData")
            .start(stepToLoadData())
            .listener(new CustomJobListener())
            .build();
}

@Bean
@StepScope
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
    threadPoolTaskExecutor.setCorePoolSize(maxThreads);
    threadPoolTaskExecutor.setThreadGroupName("taskExecutor-batch");
    return threadPoolTaskExecutor;
}

@Bean(name = "stepToLoadData")
public Step stepToLoadData() {
    TaskletStep step = stepBuilderFactory.get("stepToLoadData")
            .transactionManager(jpaTransactionManager())
            .<List<FieldSet>, List<myCustomPayloadRecord>>chunk(chunkSize)
            .reader(myCustomFileItemReader(OVERRIDDEN_BY_EXPRESSION))
            .processor(myCustomPayloadRecordItemProcessor())
            .writer(myCustomerWriter())
            .faultTolerant()
            .skipPolicy(new AlwaysSkipItemSkipPolicy())
            .skip(DataValidationException.class)
            .listener(new CustomReaderListener())
            .listener(new CustomProcessListener())
            .listener(new CustomWriteListener())
            .listener(new CustomSkipListener())
            .taskExecutor(taskExecutor())
            .throttleLimit(maxThreads)
            .build();
    step.registerStepExecutionListener(stepExecutionListener());
    step.registerChunkListener(new CustomChunkListener());
    return step;
}

My main method: Application.java

  @Autowired
    @Qualifier("jobToLoadTheData")
    private Job loadTheData;

    @Autowired
    private JobLauncher jobLauncher;

    @PostConstruct
    public void launchJob () throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException
    {
        JobParameters parameters = (new JobParametersBuilder()).addDate("random", new Date()).toJobParameters();
        jobLauncher.run(loadTheData, parameters);
    }

 public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
}

Now, normally I'm reading this .csv from Amazon S3 bucket, but since I'm testing locally, I am just placing the .csv in the project directory and reading it directly by triggering the job in the Application.java main class (as you can see above). Also, I do have some other beans defined in this BatchConfiguration class but I don't want to over-complicate this post more than it already is and from the googling I've done, the problem possibly is with the methods I posted (hopefully).

Also, I would like to point out, similar to one of the other posts on Google/stackoverflow with a user having a similar problem, I created a @RestController endpoint that simply calls the .run() method the JobLauncher and I pass in the JobToLoadTheData Bean, and it triggers the batch insert. Guess what? Data persists to the database just fine, even on spring >= 2.2.1.

What is going on here? is this a clue? is something funky going wrong with some type of entity or transaction manager? I'll take any advice tips! I can provide any more information that you guys may need , so please just ask.

ennth
  • 1,698
  • 5
  • 31
  • 63

1 Answers1

1

You are defining a bean of type JobRepository and expecting it to be picked up by Spring Batch. This is not correct. You need to provide a BatchConfigurer and override getJobRepository. This is explained in the reference documentation:

You can customize any of these beans by creating a custom implementation of the
BatchConfigurer interface. Typically, extending the DefaultBatchConfigurer
(which is provided if a BatchConfigurer is not found) and overriding the required
getter is sufficient.

This is also documented in the Javadoc of @EnableBatchProcessing. So in your case, you need to define a bean of type Batchconfigurer and override getJobRepository and getTransactionManager, something like:

@Bean
public BatchConfigurer batchConfigurer(EntityManagerFactory entityManagerFactory, DataSource dataSource) {
    return new DefaultBatchConfigurer(dataSource) {
        @Override
        public PlatformTransactionManager getTransactionManager() {
            return new JpaTransactionManager(entityManagerFactory);
        }

        @Override
        public JobRepository getJobRepository() {
            JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
            jobRepositoryFactoryBean.setDataSource(dataSource);
            jobRepositoryFactoryBean.setTransactionManager(getTransactionManager());
            // set other properties
            return jobRepositoryFactoryBean.getObject();
        }
    };
}

In a Spring Boot context, you could also override the createTransactionManager and createJobRepository methods of org.springframework.boot.autoconfigure.batch.JpaBatchConfigurer if needed.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • 1
    I've added "BatchConfigurer" bean per your post, that didn't work. I then removed that, and had my "BatchConfiguration" class that I posted above extend "DefaultBatchConfigurer" and I override the methods that way, that didn't work. I tried different transaction managers than the PlatformTransactionManager and using different ones in the Step vs JobRepository, and few more things and absolutely nothing works on Spring >= 2.2.1 (it works fine on Spring 2.2.0). – ennth May 22 '21 at 04:59
  • I'm slightly confused on your comment about the JobRepository not being picked up my Spring , as I have it defined this way exactly and on spring boot starter 2.2.0 the data persist just fine to the database? No need for defining any BatchConfigurer anonymous class or extending any classes? I'm going through the documentation now..... – ennth May 22 '21 at 05:02
  • Also , per the docs " Configuring a JobRepository When using @EnableBatchProcessing, a JobRepository is provided out of the box for you. This section addresses configuring your own." Which further explains why my configuration posted above actually did work, with the JobRepository bean defined as I did...... – ennth May 22 '21 at 06:11
  • Please provide a [minimal complete example](https://stackoverflow.com/help/minimal-reproducible-example) that reproduces the issue. – Mahmoud Ben Hassine May 22 '21 at 08:33
  • Hi @Mahmoud Ben Hassine, I have made a very clean and minimal MVP/POC per your requirements - https://github.com/alpizano/spring-batch-data-jpa-persistence-issue-mvp Can you please help out? I made a Spring Profile for Postgres that you can run -Dspring.profiles.active=postgres, I will add to the README.md right now – ennth May 24 '21 at 02:32
  • I also found the problem (seems to be) with the "taskExecutor" that I define in the "BatchConfiguration.java" class. If I comment/remove that from the TaskletStep builder method in the BatchConfiguration class, I can go up to spring 2.5.0 and I don't see any of the issues I was seeing. I think this needs to be multithreaded though, as I am reading millions of records from Amazon S3 (I only read from .CSV to create the mvp/poc for you guys to help me debug) . Any ideas how I can mimick the taskExecutor and still get this to work? Why does it work fine on spring <= 2.2.0? – ennth May 24 '21 at 05:11
  • https://github.com/spring-projects/spring-boot/issues/26133 link to original issue – ennth May 24 '21 at 05:15
  • Thanks for the minimal sample. I will try take a look and get back to you asap. – Mahmoud Ben Hassine May 24 '21 at 18:57
  • I'm not able to reproduce your issue with your minimal example, see https://github.com/benas/spring-batch-lab/tree/main/issues/so67631238. I only removed the web code which is not needed. The data is persisted correctly with both boot 2.2.0 and 2.2.1 without any code change and without commenting the task executor, see `result-2.2.0.md` and `result-2.2.1.md` files. – Mahmoud Ben Hassine May 25 '21 at 09:22
  • Well that is utterly perplexing. I just added a RestController endpoint when loading CSV data locally and that seems to work fine. Also, main use of app is to get triggered by AWS message to SQS Queue and that seems to work fine on any version of spring. I appreciate you taking the time to respond and attempt to reproduce the issue! Thank you! – ennth May 25 '21 at 13:26