3

We have a topic with 5 partitions. We are defining the partition based on the checksum of the key. There are cases where there are no key resolving to partition 3 and so there are no commits made. Hence after the configured number of days for offset retention, the consumer current offset starts showing unknown. We will need to resolve this, hence thought we will have to set log and offset retention at topic level. In the config, I see that we have config:retention.ms is for log retention but did not find corresponding offset retention configuration. Can someone please help on the same.

Edit: bin/kafka-topics.sh --zookeeper XXX --alter --topic XXXX --config retention.ms=86400000

The above is used to set the log retention time specific to the topic. But how can we specify the offset retention in the query.

Divya Paulraj
  • 123
  • 1
  • 7

3 Answers3

4

Committed consumer offsets for all consumers and all topics are stored in a single internal "__consumer_offsets" topic. Therefore you cannot control offset retention individually per topic, I'm afraid.

NB. I see this can be problematic for the case when there are no messages for prolonged periods of time on one of your topics' partitions.

I found the following ticket that can be of help: https://issues.apache.org/jira/browse/KAFKA-3806

The first comment suggests to commit offsets even in the case the consumer is making no progress (there are no new messages arriving for a given partition), to avoid this exact problem:

you would want to keep committing the offsets even though they are not changing

Michal Borowiecki
  • 4,244
  • 1
  • 11
  • 18
  • i have another question. I tried making the offsets.retention.minutes in kafka configuration file to 1 minute to test if the offsets turn as unknown and waited for more than an hour(offset clean interval was 10 mins). But it did not change. – Divya Paulraj Jun 18 '17 at 17:40
  • I think offsets (like any topic) are removed one segment at a time (and never the active segment), so I think for the purpose of your tests you'd have to also ensure the offsets topic's segments are rolling. I'm not able to advise how to design this test properly. Here's a better description of how segment deletion works in general: https://stackoverflow.com/a/40251356/7897191 – Michal Borowiecki Jun 18 '17 at 18:12
  • Thank you Michal Borowiecki – Divya Paulraj Jun 19 '17 at 03:42
0

I think you're looking for log.retention.bytes.

That there is no data at all within the retention period is however something you should fix. Either by decreasing the number of partitions or use another algorithm to create the key.

jvwilge
  • 2,474
  • 2
  • 16
  • 21
  • Thanks jvwilge. Edit: bin/kafka-topics.sh --zookeeper XXX --alter --topic XXXX --config retention.ms=86400000 The above is used to set the log retention time specific to the topic. But how can we specify the offset retention in the query. – Divya Paulraj Jun 06 '17 at 06:44
  • What exactly do you mean by that? Do you want the data added in the last X hours for example? – jvwilge Jun 06 '17 at 06:47
  • I want to configure log retention and offset retention for each kafka topic. Is it possible and if so, how to do it ? – Divya Paulraj Jun 06 '17 at 07:05
  • The settings for all topics can be changed in `server.properties`. For each individual topic is done with `kafka-topics.sh`. I'm not sure if already created topics are altered when you change `server.properties`. – jvwilge Jun 06 '17 at 07:35
  • yes thats rite. But i am looking for the specific configuration for offset retention using kafka-topics. The retention configuration we have in config is fro log retention. – Divya Paulraj Jun 06 '17 at 09:25
0

You can configure offset retention in server.properties using the parameter "offsets.retention.minutes". Default value is 1440

Offset retention is system wide so you cannot set it on an individual topic level

Hans Jespersen
  • 8,024
  • 1
  • 24
  • 31