1

I have a Flink streaming application which is consuming data from a Kafka topic which has 3 partitions. Even though, the application is continuously running and working without any obvious errors, I see a lag in the consumer group for the flink app on all 3 partitions.

./kafka-consumer-groups.sh --bootstrap-server $URL --all-groups --describe


GROUP     TOPIC      PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
group-1   topic-test 0          9566            9568            2               -               -               -
group-1   topic-test 1          9672            9673            1               -               -               -
group-1   topic-test 2          9508            9509            1               -               -               -

If I send new records, they get processed but the lag still exists. I tried to view the last few records for partition 0 and this is what I got (ommiting the message part) -

./kafka-console-consumer.sh --topic topic-test --bootstrap-server $URL --property print.offset=true --partition 0 --offset 9560

Offset:9560
Offset:9561
Offset:9562
Offset:9563
Offset:9564
Offset:9565

The log-end-offset value is at 9568 and the current offset is at 9566. Why are these offsets not available in the console consumer and why does this lag exist?

There were a few instances where I noticed missing offsets. Example -

Offset:2344
Offset:2345
Offset:2347
Offset:2348

Why did the offset jump from 2345 to 2347 (skipping 2346)? Does this have something to do with how the producer is writing to the topic?

davyjones
  • 185
  • 15

2 Answers2

1

You can describe your topic for any sort of configuration added while it was created. If the log compaction is enabled through log.cleanup.policy=compact, then the behaviour will be different in the runtime. You can see these lags, due to log compactions lags value setting or missing offsets may be due messages produced with a key but null value.

Configuring The Log Cleaner

  • The log cleaner is enabled by default. This will start the pool of cleaner threads. To enable log cleaning on a particular topic, add the log-specific property log.cleanup.policy=compact.

  • The log.cleanup.policy property is a broker configuration setting defined in the broker's server.properties file; it affects all of the topics in the cluster that do not have a configuration override in place. The log cleaner can be configured to retain a minimum amount of the uncompacted "head" of the log. This is enabled by setting the compaction time lag log.cleaner.min.compaction.lag.ms.

  • This can be used to prevent messages newer than a minimum message age from being subject to compaction. If not set, all log segments are eligible for compaction except for the last segment, i.e. the one currently being written to. The active segment will not be compacted even if all of its messages are older than the minimum compaction time lag.

  • The log cleaner can be configured to ensure a maximum delay after which the uncompacted "head" of the log becomes eligible for log compaction log.cleaner.max.compaction.lag.ms.

ChristDist
  • 572
  • 2
  • 8
  • 1
    Log compaction is not enabled on the topic. I think this behaviour is related to transactional producer inserting control batches. https://stackoverflow.com/questions/56182606/in-kafka-when-producing-message-with-transactional-consumer-offset-doubled-up – davyjones Jun 20 '22 at 19:35
0

The lag is calculated based on the latest offset committed by the Kafka consumer (lag=latest offset-latest offset committed). In general, Flink commits Kafka offsets when it performs a checkpoint, so there is always some lag if check it using the consumer groups commands.

That doesn't mean that Flink hasn't consumed and processed all the messages in the topic/partition, it just means that it has still not committed them.

Gerard Garcia
  • 1,554
  • 4
  • 7
  • You are correct. But as you can see from the data that I shared, it seems that those particular offsets don't exist at all in the topic (9566, 9567, 9568). Otherwise they would have shown up while I ran the kafka-console-consumer command. – davyjones Jun 17 '22 at 13:17
  • 1
    @davyjones if the topic is compacted, then you wouldn't see compacted offsets. Or if transaction support is enabled (which it is by default in latest Kafka) then transaction markers might be filtered out – OneCricketeer Jun 17 '22 at 14:39
  • 1
    @OneCricketeer I think you are right about the transactional producer causing this. I found another post with a similar question. Seems like it has something to do with control batches. https://stackoverflow.com/questions/56182606/in-kafka-when-producing-message-with-transactional-consumer-offset-doubled-up https://kafka.apache.org/documentation/#controlbatch – davyjones Jun 20 '22 at 19:38