After spending several days i came up with the following solution.
The key idea is to do the synchronization in two modes, namely Recovery and Normal
- In the recovery mode I only consume data but I do not produce any data.
- In normal Mode I consume and produce data.
In Kafka I implemented this using two listeners belonging to different consumer-groups. On startup all listeners are stopped and I decide with kind of listener gets enabled. Once the offset of all recovery listeners reaches the watermarks of the normal listeners I stop the recovery listners and start the normal listeners.
Below the relevant part of my code:
public void startListeners() {
log.debug("get partitions from application");
final List<KafkaPartitionStateKey> partitions = getPartitions();
log.debug("load partition state from hazelcast");
final Map<KafkaPartitionStateKey, KafkaPartitionState> kafkaPartitionStates = kafkaPartitionStateService.loadKafkaPartitionStateMap();
log.debug("check if in sync");
if (areAllPartitionsReady(partitions, kafkaPartitionStates)) {
log.info("all partitions ready, not need to start recovery");
this.messageListenerContainers.forEach(this::startContainer);
return;
}
log.debug("load consumer group offsets from kafka");
consumerGroupOffsets = getConsumerGroupOffsets();
log.debug("create missing partition states");
final List<KafkaPartitionState> updatedPartitionStates = getOrCreatePartitionStates(partitions, kafkaPartitionStates, consumerGroupOffsets);
log.debug("check if all partitions are ready");
if (getNumberOfNotReadyPartitions(updatedPartitionStates) == 0) {
log.info("all partitions ready, no need to start recovery");
this.messageListenerContainers.forEach(this::startContainer);
return;
}
log.info("----- STARTING RECOVERY -----");
this.recoveryListenerContainers.forEach(this::startContainer);
}
I hope this is usful for somebody...