I have a Kafka Connect sink running. I'd like to monitor the lag for this.
I can manually get the lag by shelling into a broker and using the kafka-consumer-groups
tool like so:
unset JMX_PORT; /usr/bin/kafka-consumer-groups --bootstrap-server localhost:9092 --group connect-<my-kafka-connect-connector> --describe
that will give me something like:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
connect-<my-kafka-connect-connector> <my-topic> 0 1414248272 2775658553 1361410281 connector-consumer-<my-kafka-connect-connector>-<uuid> /<my-host-ip> connector-consumer-<my-kafka-connect-connector>-0
that's the lag information that I want, but I want this in a Prometheus metric that I can put on a dashboard and monitor and set alerts on.
I'm ingesting the Kafka broker metrics and the Kafka Connect metrics, neither of which seem to have this information. I've poured through the Prometheus metric output with curl
and grep
and this information isn't there.
I'm runnning Kafka Connect via the official Confluent Helm chart (https://github.com/confluentinc/cp-helm-charts/tree/master/charts/cp-kafka-connect) with default Prometheus metrics export. This works, I can get basic metrics but no information on lag:
kubectl -n kafka exec -it kafka-connect-cp-kafka-connect-<id> -c cp-kafka-connect-server /bin/bash
# This will show exactly one metric with simple "running" status.
curl localhost:5556/metrics | grep <my-topic-name-or-connector-name>
cp_kafka_connect_connect_connector_metrics{connector="<my-connector-name>",status="running",task="0",} 1.0
I see third party add-ons like the following: https://github.com/lightbend/kafka-lag-exporter
This seems to do exactly what I want, but I'd rather not add another third party component to my production setup unless absolutely necessary. Do I really need a third party utility to get something so basic? If a third-party tool is necessary, are there are similar third party utilities that I should evaluate or consider?