I am trying to calculate the Lag for a Consumer Group hosted in Confluent Kafka using the below Python Code
from confluent_kafka.admin import AdminClient, NewTopic
from confluent_kafka import KafkaException, KafkaError, Consumer
from confluent_kafka import TopicPartition
import json
# Set up the configuration for the Confluent Cluster
conf = {'bootstrap.servers': 'pkc-43332.us-west1.gcp.confluent.cloud:9092',
'security.protocol': 'SASL_SSL',
'sasl.mechanism': 'PLAIN',
'sasl.username': '<user-name>',
'sasl.password': '<pswd>'}
# Create the AdminClient using the configuration
admin_client = AdminClient(conf)
# Get the consumer group description
group_metadata = admin_client.list_groups()
# Check if the consumer group is active
group_name = 'connect-consumer-group'
if group_name not in [group.id for group in group_metadata]:
print(f"No consumer group with name {group_name} found.")
exit()
# Get the consumer group details
group_info = admin_client.describe_consumer_groups([group_name])
group_info = group_info[group_name].result()
# Get the topic partitions for the consumer group
topic_partitions = {}
for member in group_info.members:
for tp in member.assignment.topic_partitions:
topic_partitions[tp.topic] = topic_partitions.get(tp.topic, []) + [tp.partition]
# Create a Consumer object
consumer_conf = {'bootstrap.servers': 'pkc-43332.us-west1.gcp.confluent.cloud:9092',
'security.protocol': 'SASL_SSL',
'sasl.mechanism': 'PLAIN',
'sasl.username': '<user-name>',
'sasl.password': '<pswd>',
'group.id': group_name,
'auto.offset.reset': 'earliest'}
consumer = Consumer(consumer_conf)
# Calculate lag for each topic partition
for topic, partitions in topic_partitions.items():
for partition in partitions:
tp = TopicPartition(topic, partition)
current_offset = consumer.position([tp])[0].offset
end_offset = consumer.get_watermark_offsets(tp)[1]
# Calculate lag
lag = end_offset - current_offset
print(f"Lag for {topic}-partition-{partition}: {lag}, end offset is {end_offset}, current offset is {current_offset}")
In this the current-offset is always -1001 for all the partitions, where as the end-offset is correctly fetched from Confluent. When I run the below command in shell, it returns the exact numbers for current and the end offset,
kafka-consumer-groups
--bootstrap-server pkc-43332.us-west1.gcp.confluent.cloud:9092
--command-config /home/dbuser/client-config.properties
--describe --group connect-consumer-group
--timeout 10000
What can we do differently in Python(passing any additional parameters, adding/updating configurations etc) to get the exact current-offset as the command executed in command line?