1

So when I look for a way to count the messages in a topic, this one is good

kafka-run-class kafka.tools.GetOffsetShell --broker-list broker1:9092,broker2:9092,broker3:9092 --topic rev-dly-upd --time -1

The only thing is, when I change the retention.ms config to retention.ms=1000, and even check that the topic has been configured by running kafka-topics --describe --zookeeper zookeeper1:2181 --topic rev-dly-upd . I can see clearly that that config is set at 1000...

Topic:rev-dly-upd   PartitionCount:8    ReplicationFactor:3 Configs:retention.ms=1000
    Topic: rev-dly-upd  Partition: 0    Leader: 159 Replicas: 159,96,160    Isr: 159,96,160
    Topic: rev-dly-upd  Partition: 1    Leader: 160 Replicas: 160,159,94    Isr: 94,160,159
    Topic: rev-dly-upd  Partition: 2    Leader: 94  Replicas: 94,160,95 Isr: 95,94,160
    Topic: rev-dly-upd  Partition: 3    Leader: 95  Replicas: 95,94,96  Isr: 95,96,94
    Topic: rev-dly-upd  Partition: 4    Leader: 96  Replicas: 96,95,159 Isr: 95,96,159
    Topic: rev-dly-upd  Partition: 5    Leader: 159 Replicas: 159,160,94    Isr: 159,94,160
    Topic: rev-dly-upd  Partition: 6    Leader: 160 Replicas: 160,94,95 Isr: 94,160,95
    Topic: rev-dly-upd  Partition: 7    Leader: 94  Replicas: 94,95,96  Isr: 95,96,94

yet when I run kafka-run-class kafka.tools.GetOffsetShell --broker-list broker1:9092,broker2:9092,broker3:9092 --topic rev-dly-upd --time -1 all I always get records returned. What could the reasons be?

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
  • 1
    You need to wait an hour for the LogCleaner thread to run – OneCricketeer Jul 31 '18 at 04:21
  • 1
    offsets are not truncated when msgs are truncated. the *data* of those messages should be gone, however, the offsets will not be reused. i understand GetOffsetShell to be a tool to list the offsets of all partitions? did you try to actually consume the topics and see if the data is indeed there? – Marius Waldal Aug 01 '18 at 12:30
  • @cricket it was the same even days later – uh_big_mike_boi Sep 12 '18 at 10:55
  • Basically, if the data for an offset is missing, then the consumer just seeks forward to the next available one. The LogCleaner should be resetting the earliest offsets, but that thread can stop working and you need to monitor it from the running server logs. In any case it should give you an approximate count, assuming topic is not compacted. The alternative of consuming and doing line count on a topic isn't reliable 1) There can be newlines within data 2) console consumer never ends, so `wc` won't stop – OneCricketeer Sep 12 '18 at 13:22

1 Answers1

0

Basically I had to stop using kafka-run-class kafka.tools.GetOffsetShell to count the messages in a topic. If you google "how to count messages in kafka topic", a lot of posts and things will lead you to think that the above command, given the right arguments, will give you a count of total messages. However if you have purged messages during the lifespan of the topic, then it will not give you an accurate count. You just have to do something like open a console consumer, output to text file, and then read the lines of that file with old-fashioned wc -l.

uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
  • Messages within a topic can't be deleted unless it's compacted, so what do you mean by "purged"? If you do `--time -1` and `--time -2`, you can take a look at the difference to count number of offsets/messages in the partitions – OneCricketeer Sep 12 '18 at 13:24
  • Purged just means I allowed the messages exhaust their retention period. I force this by changing the retention period to 1 second and then letting the messages be deleted and then changing the retention setting back to what it had been. The way you are doing it with adjusting the time period is ok but then I have to keep track of when it was last purged. And in a troubleshooting situation it would be possible that I could lose confidence if it was purged at all. Unless I had a really good auditing system set up that I didn't allow myself to bypass and no manual purges, which I don't have. – uh_big_mike_boi Sep 12 '18 at 13:39