I have pipeline which is consuming data from Kafka topic(topic uses compaction!). How can I terminate after reading all messages? for ex stop emitting messages after x amount of time has passed after the last message and terminate the read gracefully I know that I can create custom consumer for that but are another way exist? thanks in advance Problem: I need to use withMaxNumRecords for GroupByKey operation(works only for bounded sources), but this value can be wrong when compaction is on and KafkaReader from beam can't break the loop
private static PTransform<PBegin, PCollection<KafkaRecord<String, GenericRecord>>> getCountryRecords(String kafkaBroker,
String topic, Map<String, Object> kafkaProperties, Properties properties) {
return KafkaIO.<String, GenericRecord>read()
.withBootstrapServers(kafkaBroker)
.withConsumerFactoryFn(new ConsumerFactoryFn(topic, properties))
.withTopic(topic)
//.withMaxNumRecords(number)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder((Class) KafkaAvroDeserializer.class, NullableCoder.of(AvroGenericCoder.of(Entity.getClassSchema())))
.withConsumerConfigUpdates(kafkaProperties)
.withConsumerConfigUpdates(ImmutableMap.of("auto.offset.reset", (Object) "earliest"))
.withConsumerConfigUpdates(ImmutableMap.of("specific.avro.reader", (Object) "true"))
.withConsumerConfigUpdates(ImmutableMap.of("enhanced.avro.schema.support", (Object) "true"));
}
here I'm trying to close consumer after polling all messages from topic
private static class ConsumerFactoryFn implements SerializableFunction<Map<String, Object>, Consumer<byte[], byte[]>> {
String topic;
Properties properties;
public ConsumerFactoryFn(String topic, Properties properties) {
this.topic = topic;
this.properties = properties;
}
public Consumer<byte[], byte[]> apply(Map<String, Object> config) {
return new CustomCountryConsumer(new HashMap<>(config), topic, properties);
}
}
static class CustomCountryConsumer extends KafkaConsumer {
Properties properties;
String topic;
public CustomCountryConsumer(Map<String, Object> configs, String topic, Properties properties) {
super(configs);
this.topic = topic;
this.properties = properties;
}
@Override
public ConsumerRecords poll(long timeoutMs) {
int emptyPollCount = 0;
int maxRetryPollCount = 5;
Consumer<byte[], byte[]> consumer = new KafkaConsumer<>(PropertiesProvider.getConsumerKafkaProperties(properties));
consumer.subscribe(Arrays.asList(topic));
List<TopicPartition> partitions = consumer.partitionsFor(topic).stream()
.map(p -> new TopicPartition(topic, p.partition())).collect(Collectors.toList());
Map<TopicPartition, Long> partitionLastOffsetMap = consumer.endOffsets(partitions);
Map<TopicPartition, Boolean> partitionIsConsumedMap = new HashMap<>();
partitionLastOffsetMap.keySet().forEach(el -> partitionIsConsumedMap.put(el, false));
ConsumerRecords<byte[], byte[]> records;
while (true) {
records = consumer.poll(java.time.Duration.ofMillis(1000));
if (records.isEmpty()) {
emptyPollCount++;
if (emptyPollCount >= maxRetryPollCount) {
consumer.close();
break;
}
} else {
emptyPollCount = 0;
}
for (ConsumerRecord<byte[], byte[]> record : records) {
long currentOffset = record.offset();
TopicPartition topicPartition = new TopicPartition(record.topic(), record.partition());
if (currentOffset == partitionLastOffsetMap.get(topicPartition) - 1) {
partitionIsConsumedMap.put(topicPartition, true);
}
}
if (partitionIsConsumedMap.values().stream().allMatch(Boolean.TRUE::equals)) {
consumer.close();
break;
}
}
return records;
}
}
but this approach doesn't work, KafkaUnboundedReader continue polling and I don't know how to stop it(need to call close() for it somehow)