0

How can the backlog of a remote function in Flink Statefun be negative?

On a pipeline I'm currently working on, I see this behaviour regularly when the cluster is under stress and some functions are under backpressure. Then the metrics for the backlog sometimes become negative and stay there. The following chart shows this phenomenon.

Back pressure chart

According to the docs the metric numBacklog is the "The number of pending messages to be sent".

I just don't understand how this can be negative. Does anyone know?

Chr1s
  • 258
  • 3
  • 14
  • The number of backlogged messages cannot be negative. It increases if the address is blocked and then immediately resets to zero once it is available. I suspect there is a problem in how the RequestReplyFunction resets the counter metric, but I am unable to reproduce it locally. What version of StateFun are you running? This might be resolved in a more recent release. – Seth Oct 06 '21 at 19:43
  • It would also be good to know which metrics reporter you are using. – Seth Oct 06 '21 at 19:43
  • Other then the metric appearing to be negative, does overall the application makes progresses? Just trying to understand if it's a reporting issue only. – Igal Oct 06 '21 at 21:26
  • 1
    Update, I can confirm it is a bug in how the metric is reported. You can assume in this case the actual value is zero. I've opened a ticket to track and fix the problem https://issues.apache.org/jira/browse/FLINK-24464 – Seth Oct 06 '21 at 23:26
  • Ok, thanks for confirming. I thought I just haven't understood the meaning of the metric. It was with statefun 3.1.0 and using the prometheus metrics. – Chr1s Oct 08 '21 at 08:06

0 Answers0