0

I have a very basic topic, with a retention period of 1 minute. I created it with:

kafka-topics --zookeeper $zhost --topic $name --create --partitions $partitions --replication-factor 1 --config retention.ms=60000

So my topic looks like this:

Topic: topic_quick  PartitionCount: 1   ReplicationFactor: 1    Configs: retention.ms=60000
    Topic: topic_quick  Partition: 0    Leader: 0   Replicas: 0 Isr: 0  Offline:

My producer then send a message, and after 1 minute I try to consume this message.

Expected behaviour:

  • After 1 minute my consumer shouldn't receive the message I sent 1 minute ago because of the retention period.

Current behaviour:

  • The consumer consumes the message after the retention period (1 minute)

How is this possible?, it seems that the retention period is not having any effect.

Francisco Albert
  • 1,577
  • 2
  • 17
  • 34
  • My answer given [here](https://stackoverflow.com/questions/63137084/data-still-remains-in-kafka-topic-although-we-set-retention-hours-to-1h) might answer your question – Michael Heil Aug 23 '20 at 21:40

2 Answers2

0

There is one probable answer that comes to my mind. In Kafka broker data is stored in topics, which consist of partitions, which finally consist of segments. One of a broker property that I think you should set is log.segment.bytes. From Confluent's "Kafka: the Definitive Guide"

The log-retention settings previously mentioned operate on log segments, not individual messages. As messages are produced to the Kafka broker, they are appended to the current log segment for the partition. Once the log segment has reached the size specified by the log.segment.bytes parameter, which defaults to 1 GB, the log seg‐ ment is closed and a new one is opened. Once a log segment has been closed, it can be considered for expiration. A smaller log-segment size means that files must be closed and allocated more often, which reduces the overall efficiency of disk writes. Adjusting the size of the log segments can be important if topics have a low produce rate. For example, if a topic receives only 100 megabytes per day of messages, and log.segment.bytes is set to the default, it will take 10 days to fill one segment. As messages cannot be expired until the log segment is closed, if log.retention.ms is set to 604800000 (1 week), there will actually be up to 17 days of messages retained until the closed log segment expires. This is because once the log segment is closed with the current 10 days of messages, that log segment must be retained for 7 days before it expires based on the time policy (as the segment cannot be removed until the last message in the segment can be expired)

sawim
  • 1,032
  • 8
  • 18
0
  1. The purpose of 'retention.ms' is to cleanup topic's data so that too much of data doesn't become a problem for the system.

  2. It doesn't strictly enforce the retention time as defined by 'retention.ms'

  3. In addition to above, The link shared by Mike has good explanation of how 'retention.ms' actually works.

Sahil Gupta
  • 2,028
  • 15
  • 22