I want to implement a data replay for some use cases we have, and for that, I need to use Kafka retention policy (I am using join and I need the window time to be accurate). P.S. I am using Kafka version 0.10.1.1
I am sending data into the topic like this:
kafkaProducer.send(
new ProducerRecord<>(kafkaTopic, 0, (long) r.get("date_time") ,r.get(keyFieldName).toString(), r)
);
And I create my topic like this:
kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic myTopic
kafka-topics --zookeeper localhost --alter --topic myTopic --config retention.ms=172800000 kafka-topics --zookeeper localhost --alter --topic myTopic --config segment.ms=172800000
So by the above setting, I should set the retention time of my topic to 48 hours.
I extend TimestampExtractor
in order to log the actual time of each message.
public class ConsumerRecordOrWallclockTimestampExtractor implements TimestampExtractor {
private static final Logger LOG = LoggerFactory.getLogger(ConsumerRecordOrWallclockTimestampExtractor.class);
@Override
public long extract(ConsumerRecord<Object, Object> consumerRecord) {
LOG.info("TIMESTAMP : " + consumerRecord.timestamp() + " - Human readable : " + new Date(consumerRecord.timestamp()));
return consumerRecord.timestamp() >= 0.1 ? consumerRecord.timestamp() : System.currentTimeMillis();
}
}
For testing, I have sent 4 messages to my topic and I get this 4 log messages.
2017-02-28 10:23:39 INFO ConsumerRecordOrWallclockTimestampExtractor:21 - TIMESTAMP : 1488295086292 Human readble -Tue Feb 28 10:18:06 EST 2017
2017-02-28 10:24:01 INFO ConsumerRecordOrWallclockTimestampExtractor:21 - TIMESTAMP : 1483272000000 Human readble -Sun Jan 01 07:00:00 EST 2017
2017-02-28 10:26:11 INFO ConsumerRecordOrWallclockTimestampExtractor:21 - TIMESTAMP : 1485820800000 Human readble -Mon Jan 30 19:00:00 EST 2017
2017-02-28 10:27:22 INFO ConsumerRecordOrWallclockTimestampExtractor:21 - TIMESTAMP : 1488295604411 Human readble -Tue Feb 28 10:26:44 EST 2017
So based on Kafka's retention policy I expected to see two of my messaged get purged/deleted after 5 minutes (2nd and 3rd messaged since they are for Jan 1st and Jan 30th). But I tried to consume my topic for an hour and every time I consumed my topic I got all the 4 messages.
kafka-avro-console-consumer --zookeeper localhost:2181 --from-beginning --topic myTopic
My Kafka config is like this:
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
Am I doing something wrong or I miss something here?