Problem:
Flink task manager reports: apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
Deployment overview:
- A Java project to try out Stateful Functions.The streaming app reads messages from Kafka, processes messages and sends the final result to kafka egress.
- Deployed on Azure:
- Azure Event Hub (Kafka Endpoint) as ingress and egress
- Azure Kubernetes Service as k8s deployment
- Azure Data Lake Gen 2 as storage for checkpoint
Deployment is good, job manager and task manager has been launched, then I see task failed to run due to the exception
Diagnostics:
- I created a simple Java consumer with the identical kafka config, just with a different consumer group. The Java app works well both on my laptop and in AKS (deployed in the same namespace as the stateful function app is) So I get a conclusion that the Event Hub and my kafka config are both good.
- I checked the task manager log (kubectl logs xxx), and the kafka properties have been correctly loaded. The sasl.jaas.config shows as
"sasl.jaas.config = [hidden]"
but I assume this is by design.
My Kafka Settings:
I'm using the following config:
kind: io.statefun.kafka.v1/ingress
spec:
id: io.streaming/eventhub-ingress
address: xxxx.servicebus.windows.net:9093
consumerGroupId: group-receiver-00
startupPosition:
type: group-offsets
topics:
- topic: streaming-topic-rec-32
valueType: streaming.types/rec
targets:
- streaming.fns/bronze_rec
- topic: streaming-topic-eng-32
valueType: streaming.types/eng
targets:
- streaming.fns/bronze_eng
properties:
- request.timeout.ms: 60000
- security.protocol: SASL_SSL
- sasl.mechanism: PLAIN
- sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="primary connection string of the event hub ns";
Can anyone help me with this? Thank you!