0

Streaming application is rolled out in production and right after 10 days observing errors/warnings in the CustomProductionExceptionHandler for expired transactions which belongs to older day window.

FLOW :

INPUT TOPIC --> STREAMING APPLICATION(Produces stats and emits after day window closed) --> OUTPUT TOPIC

Producer continuously trying to publish records to OUTPUT Topic which is already expired with older window and logs an error into CustomProductionExceptionHandler.

I have reduced batch size and kept default but this change is not yet promoted to production.

CustomProductionExceptionHandler Implementation: To Avoid streaming to die due to NeworkException,TimeOutException.

With this implementation producer does not retry and in case of any exceptions it does CONTINUE.. On other side upon returning FAIL.. stream thread dies and does not auto restart..Need suggestions..

public class CustomProductionExceptionHandler implements ProductionExceptionHandler {

    @Override
    public ProductionExceptionHandlerResponse handle(final ProducerRecord<byte[], byte[]> record,
                                                     final Exception exception) {
        String recordKey = new String(record.key());
        String recordVal = new String(record.value());
        String recordTopic = record.topic();
        logger.error("Kafka message marked as processed although it failed. Message: [{}:{}], destination topic: [{}]", recordKey,recordVal,recordTopic,exception);
        return ProductionExceptionHandlerResponse.CONTINUE;
    }
}

Exception:

2019-12-20 16:31:37.576 ERROR com.jpmc.gpg.exception.CustomProductionExceptionHandler.handle(CustomProductionExceptionHandler.java:19) kafka-producer-network-thread | profile-day-summary-generator-291e69b1-5a3d-4d49-8797-252c2ae05607-StreamThread-19-producerid - Kafka message marked as processed although it failed. Message: [{"statistics":{}], destination topic: [OUTPUT-TOPIC]

org.apache.kafka.common.errors.TimeoutException: Expiring * record(s) for TOPIC:1086149 ms has passed since batch creation

Trying to get answer for below questions.

1) Why producer is trying to publish older transactions to OUTPUT Topic for which day window is already closed?

Example - Producer is trying to send 12/09 day window transaction but current opened window is 12/20

2) Streaming threads could have been died without CustomProductionExceptionHandler --> ProductionExceptionHandlerResponse.CONTINUE. Do we have any way that Producer can do retries in case of NetworkException or TimeoutException and then continue instead of stream thread die? Problem of specifying ProductionExceptionHandlerResponse.CONTINUE in the CustomProductionExceptionHandler is - In case of any exception it skips that record publishing to output topic and proceed with next records. No Resiliency.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Swapnil
  • 45
  • 1
  • 1
  • 9

1 Answers1

0

1) It's not really possible to answer this question without knowing what your program does. Note, that in general, Kafka Streams works on event-time and handle out-of-order data.

2) You can configure all internally used client of a Kafka Streams application (ie, consumer, producer, admin client, and restore consumer) by specifying the corresponding client configuration in the Properties you pass into KafkaStreams. If you wand different configs for different clients, you can prefix them accordingly, ie, producer.retries instead of retries. Check out the docs for more details: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#ak-consumers-producer-and-admin-client-configuration-parameters

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137