2

I would like to set offset of a topic mytopic with 1 partition and given group ID testgroup1 to 0. However it is not always possible. If I want to set offset to 0 I get this message:

bash-4.4# kafka-consumer-groups.sh --bootstrap-server localhost:9092 --topic mytopic --group testgroup1 --reset-offsets --to-offset 0 --execute
[2021-06-04 09:23:30,854] WARN New offset (0) is lower than earliest offset for topic partition mytopic-0. Value will be set to 1365671 (kafka.admin.ConsumerGroupCommand$)

bash-4.4# kafka-topics.sh --bootstrap-server localhost:9092 --topic mytopic --describe 
Topic: mytopic  PartitionCount: 1   ReplicationFactor: 1    Configs: segment.bytes=1073741824
    Topic: mytopic  Partition: 0    Leader: 1001    Replicas: 1001  Isr: 1001

bash-4.4# kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-name mytopic --entity-type topics
Dynamic configs for topic mytopic are:
bash-4.4#

In Kafka log I can see this after whole topic was consumed; not sure if it's really related:

[2021-06-04 10:18:36,130] INFO [Log partition=__consumer_offsets-19, dir=/kafka/logs] Deleting segment files LogSegment(baseOffset=0, size=634, lastModifiedTime=1598954190000, largestRecordTimestamp=Some(1585909899136)) (kafka.log.Log)
[2021-06-04 10:18:36,131] INFO Deleted log /kafka/logs/__consumer_offsets-19/00000000000000000000.log.deleted. (kafka.log.LogSegment)
[2021-06-04 10:18:36,132] INFO Deleted offset index /kafka/logs/__consumer_offsets-19/00000000000000000000.index.deleted. (kafka.log.LogSegment)
[2021-06-04 10:18:36,132] INFO Deleted time index /kafka/logs/__consumer_offsets-19/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)

It is not even possible to consume topic again using this command:

kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --group testgroup1 --topic mytopic

I've read other questions like:

But I haven't found a reason why Kafka behaves like that, i.e. earliest offset to set to different value and it is not possible to return to offset 0 again. Maybe it has something to do with data retention but I've tried to set log retention to 3 years:

log.cleaner.backoff.ms = 15000
log.cleaner.dedupe.buffer.size = 134217728
log.cleaner.delete.retention.ms = 86400000
log.cleaner.enable = true
log.cleaner.io.buffer.load.factor = 0.9
log.cleaner.io.buffer.size = 524288
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
log.cleaner.max.compaction.lag.ms = 9223372036854775807
log.cleaner.min.cleanable.ratio = 0.5
log.cleaner.min.compaction.lag.ms = 0
log.cleaner.threads = 1
log.cleanup.policy = [delete]
log.dir = /tmp/kafka-logs
log.dirs = /kafka/logs
log.flush.interval.messages = 9223372036854775807
log.flush.interval.ms = null
log.flush.offset.checkpoint.interval.ms = 60000
log.flush.scheduler.interval.ms = 9223372036854775807
log.flush.start.offset.checkpoint.interval.ms = 60000
log.index.interval.bytes = 4096
log.index.size.max.bytes = 10485760
log.message.downconversion.enable = true
log.message.format.version = 2.7-IV2
log.message.timestamp.difference.max.ms = 9223372036854775807
log.message.timestamp.type = CreateTime
log.preallocate = false
log.retention.bytes = -1
log.retention.check.interval.ms = 300000
log.retention.hours = 26280
log.retention.minutes = null
log.retention.ms = null
log.roll.hours = 168
log.roll.jitter.hours = 0
log.roll.jitter.ms = null
log.roll.ms = null
log.segment.bytes = 1073741824
log.segment.delete.delay.ms = 6000
Michal Špondr
  • 1,337
  • 2
  • 21
  • 44

1 Answers1

5

Kafka topic with cleanup policy DELETE, the default "type" of topic get its data pruned (based on configuration- size/time retentions) so in your case the data just does not exist anymore in the topic, offset counter is always moving forward, so old offsets like 0 does not have data in the topic , hope that clears things out

Check your topic configuration if it has different retention configuration internal configured

Ran Lupovich
  • 1,655
  • 1
  • 6
  • 13
  • Good to learn about cleanup policy, thanks! However I think data remains in topic, but it is not available in topic _partition_, am I right? I am still able to read topic data from beginning using a new group ID. – Michal Špondr Jun 04 '21 at 09:24
  • the beginning of the partition... it does not mean on consumer-groups describe it would be 0 at current offset, and it does not mean it will give you pruned data – Ran Lupovich Jun 04 '21 at 09:28
  • My goal is to use offset 0, not the "earliest" offset which might not be 0. Is it even possible with Kafka to start reading topic from beginning again without using different group ID? – Michal Špondr Jun 04 '21 at 09:37
  • 1
    Yes, of course, using from beginning or latest takes action only in initial setup of consumer group, after that the cg lives inside kafka (for configigured time) and members keeping up from where they left off, secondly you can change the current offset of consumer group like you did with consumer group reset offsets , you can not read offset that had been pruned from topic – Ran Lupovich Jun 04 '21 at 09:44
  • It seems I need to find out what is the retention configuration, because I have log.retention.bytes = -1 and log.retention.hours = 26280, so I don't think delete policy should take place. But there are "Deleted log" messages in Kafka log, it confuses me. – Michal Špondr Jun 04 '21 at 10:44
  • Issue describe on topic with kafka-topics.sh tool – Ran Lupovich Jun 04 '21 at 10:46
  • Oh, I've noticed there is `offsets.retention.minutes` configuration in Kafka, too! https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html#brokerconfigs_offsets.retention.minutes I'll investigate it. – Michal Špondr Jun 04 '21 at 10:50
  • That's related to consumer group, not related to your data in the topic being pruned ... in your case you probably created the topic specifically with some retention limit which take preceding in over the broker configuration, please share your topic configuration describe for further assistance – Ran Lupovich Jun 04 '21 at 10:58
  • I've edit the post and added the Kafka topic description. – Michal Špondr Jun 04 '21 at 11:11
  • Thanks so the other possible reason for data being pruned is if their created time is old , which is configured by producer, how is your data produced? Secondly is the producer clock and kafka clock sync? – Ran Lupovich Jun 04 '21 at 11:21
  • For example in other ticket this week we found out the producer was creating new events with createtimestamp of 51 years old, and got pruned – Ran Lupovich Jun 04 '21 at 11:24
  • I've tried to print timestamp of the messages: `kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytopic --from-beginning --property print.timestamp=true` and I've got this date for first event: 2020-05-08. So it's more than 1 year old data. – Michal Špondr Jun 04 '21 at 11:43
  • KAFKA_HOME/bin/kafka-configs.sh --zookeeper $ZK --describe --entity-name test-topic --entity-type topics – Ran Lupovich Jun 04 '21 at 11:52
  • Please use above command to get full config details – Ran Lupovich Jun 04 '21 at 11:52
  • I've added it to the main post, but the output is empty. – Michal Špondr Jun 04 '21 at 11:59
  • 1
    @Michal This answer solves your original question. If you would like to find the retention time of a topic, please create a new post – OneCricketeer Jun 04 '21 at 13:01