0

Requirement - Consume only latest messages from the topic on Manual Restart or Unexpected Failure

When a Flink job fails and restarted, the job starts with restored checkpoint and this is trying to process the records from Kafka stored in the state. In order to avoid the old records, I tried changing the group id. Still the records from the checkpoint is being processed.

I am using the following code to process only the latest records. It works. But the only problem is that I am not able to ignore the state from checkpoint for Flink Kafak Consumer in case of unexpected failure.

Code: myConsumer.setStartFromLatest();

Documentation: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration

My only requirement is to process the latest events from Kafka.

Thank you

Raghavendar
  • 159
  • 1
  • 11
  • If the checkpoints are not helpful, why not just disable them? – David Anderson Apr 04 '20 at 13:05
  • We have multiple usecases to cover in the job. One of the case doesn't require checkpoints. If there is no other way, we will have to split the business logic into multiple jobs. Do you have any other idea? – Raghavendar Apr 04 '20 at 17:53
  • I can't think of any way to get Flink's kafka consumer to ignore the offsets in its state during a restore. – David Anderson Apr 04 '20 at 19:00
  • You could try changing the userId of the Kakfa's source and then restoring your job with `--allowNonRestoredState` option. This way you should be able to ignore the state of the Kafka consumer. Unfortunately though it will still remain in the checkpoint. – Dawid Wysakowicz Apr 05 '20 at 13:07
  • I tried changing the groupId in the Kafka source. Flink stills replays events from the checkpoint. I need checkpointing as there are other operators which needs state. – Raghavendar Apr 05 '20 at 17:17
  • @Raghavendar: Facing a similar issue. Were you able to find a solution? – Shadow Feb 16 '21 at 13:02
  • @Shadow If checkpointing is enabled and If the jobs crashes, the job manager will recover the job from the last checkpoint and there is no way to ignore the checkpoint. If checkpointing is disabled, you can enable Kafka auto commit which will commit the offsets to Kafka on periodic intervals. So If the job crashes/recovers or the job is started manually, myConsumer.setStartFromLatest(); option will read the latest offset from Kafka. In both the cases there are possibilities of getting duplicate records into the downstream operators since checkpointing/Kafka auto commit runs periodically. – Raghavendar Feb 17 '21 at 04:58
  • @Shadow Also you can introduce a filter (Can be used only If the the incoming event has created date field) to ignore late events. Initialize a filter with a class level date/time field which will be initialized in open() method. Compare the created date field in the incoming event and the date field in the filter operator to filter latest events. – Raghavendar Feb 17 '21 at 04:58
  • @Raghavendar: Thanks for the reply, but I am not able to read from the start of the queue, ideally changing group.id should accomplish this right? Even if we Enable kakfa auto commit correct? – Shadow Feb 17 '21 at 09:59
  • 1
    @Shadow Assuming checkpointing is enabled, the following code will read from start of the topic. FlinkKafkaConsumer consumer = new FlinkKafkaConsumer<>(...); consumer.setStartFromEarliest(); However If the job crashes, the job would have checkpointed in the state periodically. So when the job is restored by job manager in case of failure, the offset from the checkpoint is restored (ignoring the configuration consumer.setStartFromEarliest()). Docs: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration – Raghavendar Feb 20 '21 at 04:10

0 Answers0