0

Given I have this Spring Batch configuration for my workflow job and I am using Sql Server database for my spring batch tables:

public class MyConfiguration extends AbstractConfiguration {
   
    @Bean
    @Qualifier("pollStep")
    public Step pollStep() {
        return stepBuilderFactory.get("pollStep")
                                 .tasklet(filePollingTasklet())
                                 .listener(promoteContextListener())
                                 .build();
    }

    @Bean
    @StepScope
    private Tasklet filePollingTasklet() {
        return ((stepContribution, chunkContext) -> getStatus(stepContribution, chunkContext));
    }

    private RepeatStatus getStatus(StepContribution stepContribution, ChunkContext chunkContext) {
        //some code
        Map<String, Boolean> result = poller.pollForFile(myContext, sourceInfo);
        return RepeatStatus.FINISHED;
    }

}

My application polls for a file on remote server. After 100 mins when it can't find a file the poller.pollForFile() throws a runtime exception and my step status is UNKNOWN and the application exits with exceptions:

c.m.s.j.SQLServerException: Connection reset at 
c.m.s.j.SQLServerConnection.terminate(SQLServerConnection.java:1667) at 
c.m.s.j.SQLServerConnection.terminate(SQLServerConnection.java:1654) at 
c.m.s.j.TDSChannel.write(IOBuffer.java:1805) at c.m.s.jdbc.TDSWriter.flush(IOBuffer.java:3581) at 
c.m.s.jdbc.TDSWriter.writePacket(IOBuffer.java:3482) at 
c.m.s.jdbc.TDSWriter.endMessage(IOBuffer.java:3062) at 
c.m.s.j.TDSCommand.startResponse(IOBuffer.java:6120) at 
c.m.s.j.TDSCommand.startResponse(IOBuffer.java:6106) at 
c.m.s.j.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1756) at 
c.m.s.j.TDSCommand.execute(IOBuffer.java:5696) at 
c.m.s.j.SQLServerConnection.executeCommand(SQLServerConnection.java:1715) at 
c.m.s.j.SQLServerConnection.connectionCommand(SQLServerConnection.java:1761) at 
c.m.s.j.SQLServerConnection.rollback(SQLServerConnection.java:1964) at 
c.z.h.p.ProxyConnection.rollback(ProxyConnection.java:375) at 
c.z.h.p.HikariProxyConnection.rollback(HikariProxyConnection.java) at 
o.h.r.j.i.AbstractLogicalConnectionImplementor.rollback(AbstractLogicalConnectionImplementor.java:116) ... 50 common frames omitted Wrapped by: u003c#7f0e356au003e o.h.TransactionException: Unable to rollback against JDBC Connection at ...

I think the sql server db connection is timed out and closed and spring batch is unable to perform rollback and db updates. Ideally, I want status to be FAILED which it is when I run locally with H2 but on this instance what strategy or techniques can I use to overcome this issue? The exit message doesnt have the error from exception thrown by pollForFile(), instead it is org.springframework.transaction.TransactionSystemException: Could not roll back JPA transaction; nested exception is org.hibernate.TransactionException: Unable to rollback against JDBC Connectionat

Is there a way to fix this issue? What if I were to move from tasklet to chunk-oriented and perform the poll logic in read() method of ItemReader ?

M06H
  • 1,675
  • 3
  • 36
  • 76

1 Answers1

2

Your thinking is correct. When the commit fails, Spring Batch is unable to correctly update the step status which ends in UNKNOWN instead of FAILED. There is an open issue for that here: https://github.com/spring-projects/spring-batch/issues/1826. While your exception is different, the problem is the same. I had an attempt to fix that here: https://github.com/spring-projects/spring-batch/pull/591 but I decided to discard it (you can find more details about the reasons in that PR).

To work around the issue, you need to make sure any (runtime) exception is handled in the tasklet (or in item writer in case of a chunk-oriented step). In your case, you can increase the timeout of your transaction and catch runtime exception in the tasklet (which you can wrap in a meaningful exception that you re-throw from the tasklet to make it fail).

EDIT: add example of increasing transaction timeout

@Bean
@Qualifier("pollStep")
public Step pollStep() {
   DefaultTransactionAttribute attribute = new DefaultTransactionAttribute();
   attribute.setTimeout(60 * 100);
   // set other transaction attributes
   return stepBuilderFactory.get("pollStep")
                            .tasklet(filePollingTasklet())
                            .transactionAttribute(attribute)
                            .listener(promoteContextListener())
                            .build();
}



Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • if I catch the runtime exception, how can I fail the step? my connection has also been reset at this point . – M06H May 12 '21 at 17:48
  • my issue is more about strategy for dealing with connection being reset whilst the tasklet has been running for long time – M06H May 12 '21 at 17:49
  • Whilst I understand the point about catching runtime exception...not sure about how to increase the timeout of the transaction for this particular case. – M06H May 17 '21 at 08:46
  • I added an example in the answer. – Mahmoud Ben Hassine May 17 '21 at 08:54
  • Thanks...that’s very useful – M06H May 17 '21 at 08:56
  • this didn't work. I added timeout as above to my step executing tasklet. The tasklet uses a retry template and the task is long running for 90 mins. with the above setting of 100 mins timeout, I still get exception: `org.springframework.transaction.TransactionSystemException: Could not roll back JPA transaction; nested exception is org.hibernate.TransactionException: Unable to rollback against JDBC Connection` – M06H May 19 '21 at 16:28
  • Well in that case it is not the timeout that is causing your issue as you mentioned in your description: `I think the sql server db connection is timed out and closed and spring batch is unable to perform rollback and db updates.` You need to check what is causing your connection to be reset:`c.m.s.j.SQLServerException: Connection reset at ..` – Mahmoud Ben Hassine May 19 '21 at 16:36
  • I had to resolve the issue by moving the tasklet logic into ItemReader and doing it like chunk oriented and keeping the connection alive. Not ideal but this works for now. – M06H May 21 '21 at 11:17