0

I have pipeline which is consuming data from Kafka topic(topic uses compaction!). How can I terminate after reading all messages? for ex stop emitting messages after x amount of time has passed after the last message and terminate the read gracefully I know that I can create custom consumer for that but are another way exist? thanks in advance Problem: I need to use withMaxNumRecords for GroupByKey operation(works only for bounded sources), but this value can be wrong when compaction is on and KafkaReader from beam can't break the loop

    private static PTransform<PBegin, PCollection<KafkaRecord<String, GenericRecord>>> getCountryRecords(String kafkaBroker,
    String topic, Map<String, Object> kafkaProperties, Properties properties) {
    return KafkaIO.<String, GenericRecord>read()
        .withBootstrapServers(kafkaBroker)
        .withConsumerFactoryFn(new ConsumerFactoryFn(topic, properties))
        .withTopic(topic)
        //.withMaxNumRecords(number)
        .withKeyDeserializer(StringDeserializer.class)
        .withValueDeserializerAndCoder((Class) KafkaAvroDeserializer.class, NullableCoder.of(AvroGenericCoder.of(Entity.getClassSchema())))
        .withConsumerConfigUpdates(kafkaProperties)
        .withConsumerConfigUpdates(ImmutableMap.of("auto.offset.reset", (Object) "earliest"))
        .withConsumerConfigUpdates(ImmutableMap.of("specific.avro.reader", (Object) "true"))
        .withConsumerConfigUpdates(ImmutableMap.of("enhanced.avro.schema.support", (Object) "true"));
}

here I'm trying to close consumer after polling all messages from topic

 private static class ConsumerFactoryFn implements SerializableFunction<Map<String, Object>, Consumer<byte[], byte[]>> {
    String topic;
    Properties properties;

    public ConsumerFactoryFn(String topic, Properties properties) {
        this.topic = topic;
        this.properties = properties;
    }

    public Consumer<byte[], byte[]> apply(Map<String, Object> config) {
        return new CustomCountryConsumer(new HashMap<>(config), topic, properties);
    }
}

static class CustomCountryConsumer extends KafkaConsumer {
    Properties properties;
    String topic;

    public CustomCountryConsumer(Map<String, Object> configs, String topic, Properties properties) {
        super(configs);
        this.topic = topic;
        this.properties = properties;
    }

    @Override
    public ConsumerRecords poll(long timeoutMs) {
        int emptyPollCount = 0;
        int maxRetryPollCount = 5;
        Consumer<byte[], byte[]> consumer = new KafkaConsumer<>(PropertiesProvider.getConsumerKafkaProperties(properties));
        consumer.subscribe(Arrays.asList(topic));
        List<TopicPartition> partitions = consumer.partitionsFor(topic).stream()
            .map(p -> new TopicPartition(topic, p.partition())).collect(Collectors.toList());

        Map<TopicPartition, Long> partitionLastOffsetMap = consumer.endOffsets(partitions);

        Map<TopicPartition, Boolean> partitionIsConsumedMap = new HashMap<>();
        partitionLastOffsetMap.keySet().forEach(el -> partitionIsConsumedMap.put(el, false));

        ConsumerRecords<byte[], byte[]> records;

        while (true) {
            records = consumer.poll(java.time.Duration.ofMillis(1000));

            if (records.isEmpty()) {
                emptyPollCount++;
                if (emptyPollCount >= maxRetryPollCount) {
                    consumer.close();

                    break;
                }
            } else {
                emptyPollCount = 0;
            }

            for (ConsumerRecord<byte[], byte[]> record : records) {

                long currentOffset = record.offset();

                TopicPartition topicPartition = new TopicPartition(record.topic(), record.partition());
                if (currentOffset == partitionLastOffsetMap.get(topicPartition) - 1) {
                    partitionIsConsumedMap.put(topicPartition, true);
                }
            }
            if (partitionIsConsumedMap.values().stream().allMatch(Boolean.TRUE::equals)) {
                consumer.close();
                break;
            }

        }
        return records;

    }
}

but this approach doesn't work, KafkaUnboundedReader continue polling and I don't know how to stop it(need to call close() for it somehow)

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
ovod
  • 49
  • 1
  • 7
  • In general, if you want to stop "after X minutes" you'd use batch consumer, not streaming. Otherwise, I'm not sure Beam exposes any information about the "end" of the topic. It'll just idlely poll for more data. You'd need an external thread to monitor Kafka consumer position/lag – OneCricketeer Aug 13 '23 at 15:00
  • I absolutely agree with you but architecture of this app is strange and I can't change it. so I need any ugly solution – ovod Aug 14 '23 at 11:49

0 Answers0