We need to Monitor disk space usage for Kafka Brokers running in AWS MSK cluster.
There're several metrices emitted by Kafka which can be used to monitor various aspects. But I was unable to find any specific metric that monitors "Disk Usage" for each broker.
Although, it depends on message and log retention policy and the rate at which new events are coming in various topics, how we can predict if our brokers go out of disk in next 1 days (or whatever duration we want as safe threshold).
If we can monitor the average size of event payload and events per minute (or hour), it can help in making this calculation. I was referring to Apache Kafka documentation for available metrices, but I was unable to find this as well.
avg(rate(kafka_server_BrokerTopicMetrics_FifteenMinuteRate{ name="BytesInPerSec"}[1h]))/avg(rate(kafka_server_BrokerTopicMetrics_FifteenMinuteRate{ name="BytesOutPerSec"}[1h]))
Tried above PQL. If anyone can suggest a healthy range for ByteIn/ByteOut, it may be used with confidence.
All pointers are highly appreciated.