I am using Kafka Streams in a critical application, and I am facing issues where transactions are getting expired in idle threads. This causes issues after re-balance where a task shifts to a previously idle thread where the producer has expired. However, this doesn't become apparent until the producer tries to send for the first time at which point it throws a ProducerFencedException and the stream shuts down. Then we need to recycle the application to get it to start processing again, which isn't acceptable.
Here is the application setup:
- Single topic with 2 partitions
- 4 instances of the Spring Boot application running with 2 stream threads per application instance. The reason for additional instances is that it is a critical application and we have to allow for 2 instances of the application potentially being down for server patching and still have resiliency by having multiples instances of the application running (i.e. 2). Each application is capable of doing the full load themselves within SLAs
I'd appreciate any insights anyone has into how we can setup our Kafka Streams application or Kafka cluster to not expire transactions with this setup.
Relevant Versions: Kafka cluster version: 1.0.0, Kafka client version: 1.1.0, Spring Boot version: 2.0.0