Pyflink Python app deployed to 2 pods produces duplicates in the output topic

Asked Aug 16 '23 at 08:52

Active Aug 16 '23 at 08:52

Viewed 9 times

I have a Pyflink app as pure python app - executing as "python -m flink_app.py"

Assuming that I have simple datastream app, consuming from input kafka topic and producing to output kafka topic. Due to the scale, I need to deploy this app on 2 Kubernetes Pods.

Unfortunately, it seems that Flink ignores the group.id in my configs and each deployment works as a standalone app, therefore producing duplicates in my output topic.

Do you know how to solve this?

I tried to use group.id in the kafka config passed to FlinkKafkaConsumer as follows:

conf = {
        'bootstrap.servers': servers,
        'group.id': 'pyflink_processor',
    'sasl.jaas.config': f'org.apache.kafka.common.security.plain.PlainLoginModule required\n'
                                f' username="{username}" password="{password}";',
        'security.protocol': 'SASL_SSL',
        'sasl.mechanism': 'PLAIN',
        'ssl.endpoint.identification.algorithm': 'https'
    }

asked Aug 16 '23 at 08:52

user18399293

Pyflink Python app deployed to 2 pods produces duplicates in the output topic

0 Answers0