3

we're building a monitoring system for our Spring Boot application using micrometer. Metrics gathered are published to an Elasticsearch instance through micrometer-registry-elastic. Everything is fine, except for:

  • kafka_consumer_fetch_manager_records_lag is always 0, even when I'm sure there's lag on the consumer group
  • kafka_consumer_fetch_manager_records_lag_avg is always 0, even when I'm sure there's lag on the consumer group
  • kafka_consumer_fetch_manager_records_lag_max has a value different from 0 only on the first measurement exposed.

All other metrics such as kafka_consumer_fetch_manager_records_lead are correctly set.

Versions involved:

  • spring-boot:2.5.4
  • micrometer:1.7.3
  • micrometer-registry-elastic:1.7.3
  • spring-kafka:2.7.6
  • kafka-clients:2.7.1
  • Kafka broker: 2.7.0

I debugged the entire setup and there's no evidence of error, MicrometerConsumerListener is correctly created, even KafkaClientMetricsand all the Sensorinstances. I have no idea what's the problem, we haven't any particular customization and no error logging message. It seems like there aren't samples with value different from 0 for the metrics above, but I'm pretty sure there's lag on the broker because I verified that through command line tool directly on the broker.

Any thoughts? Thanks a lot

tommaso.normani
  • 260
  • 2
  • 12
  • Are messages arriving in the topics? Also can you provide any code or configurations? – Fermi-4 Oct 11 '21 at 13:26
  • As stated above, `consumer lag` is increasing in the broker. It means that message are arriving on the topic. Also `kafka_consumer_fetch_manager_records_lead` is greater than 0, which means that consumers are processing records. Sadly i think that none code or configuration could be useful without a proper startup hint on what to provide :) btw everything is autoconfigured via Spring Boot – tommaso.normani Oct 11 '21 at 13:35
  • 1
    Looks like a `kafka-clients` issue - I see it as 0.0 in VisualVM (MBean plugin) too. – Gary Russell Oct 11 '21 at 14:22
  • @GaryRussell thank you. May I file an issue to apache-kafka to do some further investigation? Is this https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka the proper channel? – tommaso.normani Oct 12 '21 at 08:08
  • You don't need to ask for my permission; I am not involved with Kafka development. – Gary Russell Oct 12 '21 at 13:22
  • @tommaso.normani did you submit the issue? or did you solve your issue some other way? looking at kafka issues I found https://issues.apache.org/jira/browse/KAFKA-5855, which seems similar but turned out being an issue with the app, not with kafka itself, but I am also not confident that lag is always 0 – ThanksForAllTheFish Jan 06 '22 at 07:09
  • @ThanksForAllTheFish I've created a simple test case before submitting the issue, but I can't reproduce the problem. `kafka.consumer.fetch.manager.records.lag` behave as expected, so I'm still working on this problem in my spare time. Atm I've adopted a workaroud with `records-lead`metric – tommaso.normani Jan 07 '22 at 16:26

0 Answers0