I have a Pyflink app as pure python app - executing as "python -m flink_app.py"
Assuming that I have simple datastream app, consuming from input kafka topic and producing to output kafka topic. Due to the scale, I need to deploy this app on 2 Kubernetes Pods.
Unfortunately, it seems that Flink ignores the group.id in my configs and each deployment works as a standalone app, therefore producing duplicates in my output topic.
Do you know how to solve this?
I tried to use group.id in the kafka config passed to FlinkKafkaConsumer as follows:
conf = {
'bootstrap.servers': servers,
'group.id': 'pyflink_processor',
'sasl.jaas.config': f'org.apache.kafka.common.security.plain.PlainLoginModule required\n'
f' username="{username}" password="{password}";',
'security.protocol': 'SASL_SSL',
'sasl.mechanism': 'PLAIN',
'ssl.endpoint.identification.algorithm': 'https'
}