I'm trying benchmark Apache Kafka and at the moment i am trying to measure the latency. In order to do this, i am producing 10 MB of tweet data (~14500 tweets) to not lose an overview about whats going on, but i can basically send arbitrarily much later. So those 10 MBs are only for testing purposes for now.
I have configured the Broker to use LogAppendTime
timestamps, which i am comparing to the current time on the Consumer side. While doing this, i always get latency values that are increasing with the time (see details below) and this is not what i have expected.
Setup
- VM with Ubuntu 20
- 2 VPUs
- 9GB RAM
confluent-kafka
andApache Kafka
in most recent version- 1 Partition, 1 Kafka Broker, all default settings (except for the
LogAppendTime
setting)
Producer and Consumer and both running on the same machine (at least for now). VM is running in a data center, so bandwidth should be no problem.
Constants
CONSUMER_GROUP_ID = "twitter-consumers"
KAFKA_TOPIC_TWITTER = "twitter-stream"
Producer Code
producer_config = {
"bootstrap.servers": "localhost:9092"
}
p = Producer(producer_config)
generation_steps = 1 # 10 MBs per step
with open(TWITTER_DATA_PATH, "r") as dataset:
for step in range(generation_steps):
print(f"Executing data generation step {step}...")
dataset.seek(0) # Jump back to first line
for tweet in dataset:
try:
print("IN QUEUE: {}".format(len(p)))
p.produce(KAFKA_TOPIC_TWITTER, value=tweet)
p.poll(0)
except BufferError:
print('[INFO] Local producer queue is full (%d messages awaiting delivery): Trying again after flushing...\n' % len(p))
p.flush() # Send the messages in the queue
# The tweet that caused the BufferError could not be sent, so after flushing, we can now send it
p.produce(KAFKA_TOPIC_TWITTER, value=tweet)
p.poll(0)
print("Data generation done!")
Consumer Code
# Consume the data
consumer_config = {
"bootstrap.servers": "localhost:9092",
"group.id": CONSUMER_GROUP_ID,
"client.id": "client-1",
"enable.auto.commit": True,
"session.timeout.ms": 6000,
"default.topic.config": {"auto.offset.reset": "smallest"}
}
c = Consumer(consumer_config)
c.subscribe([KAFKA_TOPIC_TWITTER])
msg_count = 0
try:
while True:
msg = c.poll(0.1)
if msg is None:
continue
elif not msg.error():
delta = (time.time() * 1000) - msg.timestamp()[1]
msg_count += 1
print("Message {} received | Timestamp: {} | Latency: {}".format(msg_count, msg.timestamp()[1], delta))
elif msg.error().code() == KafkaError._PARTITION_EOF:
print("End of partition reached {}/{}".format(msg.topic(), msg.partition()))
else:
print("Error occured: {}".format(msg.error().str()))
except KeyBoardInterrupt:
pass
finally:
c.close()
What I get
Consumer Side
Message 1 received | Timestamp: 1618685061270 | Latency: 12.360107421875
Message 2 received | Timestamp: 1618685061270 | Latency: 13.025390625
Message 3 received | Timestamp: 1618685061270 | Latency: 13.08544921875
Message 4 received | Timestamp: 1618685061270 | Latency: 13.156005859375
Message 5 received | Timestamp: 1618685061270 | Latency: 13.1923828125
Message 6 received | Timestamp: 1618685061270 | Latency: 13.21875
Message 7 received | Timestamp: 1618685061270 | Latency: 13.245849609375
Message 8 received | Timestamp: 1618685061270 | Latency: 13.29833984375
Message 9 received | Timestamp: 1618685061270 | Latency: 13.348388671875
Message 10 received | Timestamp: 1618685061270 | Latency: 13.389892578125
Message 11 received | Timestamp: 1618685061270 | Latency: 13.43701171875
Message 12 received | Timestamp: 1618685061270 | Latency: 13.49755859375
Message 13 received | Timestamp: 1618685061270 | Latency: 13.54345703125
Message 14 received | Timestamp: 1618685061270 | Latency: 13.581787109375
Message 15 received | Timestamp: 1618685061270 | Latency: 13.626708984375
Message 16 received | Timestamp: 1618685061270 | Latency: 13.661376953125
Message 17 received | Timestamp: 1618685061270 | Latency: 13.695556640625
Message 18 received | Timestamp: 1618685061270 | Latency: 13.720703125
Message 19 received | Timestamp: 1618685061270 | Latency: 13.75537109375
Message 20 received | Timestamp: 1618685061270 | Latency: 13.78857421875
Message 21 received | Timestamp: 1618685061270 | Latency: 13.81396484375
[....]
Message 14485 received | Timestamp: 1618685061337 | Latency: 1061.004150390625
Producer side
Executing data generation step 0...
IN QUEUE: 0
IN QUEUE: 1
IN QUEUE: 2
IN QUEUE: 3
IN QUEUE: 4
IN QUEUE: 5
IN QUEUE: 6
IN QUEUE: 7
IN QUEUE: 8
IN QUEUE: 9
IN QUEUE: 10
IN QUEUE: 11
IN QUEUE: 12
IN QUEUE: 13
IN QUEUE: 14
IN QUEUE: 15
IN QUEUE: 16
[...]
IN QUEUE: 14484
Data generation done!
What i would expect
As you can see, i am comparing the timestamp using delta = (time.time() * 1000) - msg.timestamp()[1]
. I would expect that the latency varies from 10 - 100ms at most. I don't understand why the latency increases "linearly" with the time.
Also, i dont understand why the buffer queue
gets so large, considering that Kafka should clear the queue after each .poll()
call as far as i understood.
Does anybody know what could be causing this and what i could do to about it?