0

I´d like to detect times where an IP is jumping between nodes. Each time an ip jumps, it is announced by the node and that is visible via this prometheus metric: metallb_speaker_announced

This metric will show the following info: metallb_speaker_announced{app_kubernetes_io_component="speaker", app_kubernetes_io_instance="metallb-system", app_kubernetes_io_name="metallb", instance="10.147.52.129:7472", ip="192.168.1.21", job="kubernetes-pods", kubernetes_namespace="metallb", kubernetes_node_name="node01", kubernetes_pod_name="metallb-system-spk-5whj5", node="node01", protocol="layer2", service="metallb/service-1"}

How would the PromQL expression would look like if we wanted to detect if an IP has been announced at least 3 times from at least 2 different nodes in the last 5 minutes?

To complete information for better context, metallb_speaker_announced events are triggered by different type of events and they are harmless as long as the kubernetes node making the announcement is the same. IF, the kubernetes node making the announcment alternates, that is a relevant problem that could be the consequence of things like the node having a flapping NIC or other conditions.

carrotcakeslayer
  • 809
  • 2
  • 9
  • 33

1 Answers1

1

I'm unable to repro your example as I don't have MetalLB and a bunch of nodes but...

If we can assume that metallb_speaker_announced only triggers on a new node, the first firing will be the 1st node and the second firing will be a different, 2nd node. Any subsequent firings e.g. 3rd is either from the 1st node again or from a 3rd node. So, 2+ firings is guaranteed to be >=2 nodes.

Then, I think you can sum_over_time(metallb_speaker_announced{}[5m) to sum all the announcements for the last 5 minutes.

And then you can sum by(ip) (sum_over_time(metallb_speaker_announced{}[5m)) to get the results summarized by ip.

And then you can sum by(ip) (sum_over_time(metallb_speaker_announced{}[5m)) >= 3 to filter the results by those ip's that occurred >=3 times.

DazWilkin
  • 32,823
  • 5
  • 47
  • 88
  • 1
    Hi DazWilkin. I'll try to add info to the initial question. The initial assumption is not enterily correct, since a metallb_speaker_announced can be triggered by several events...which is harmless. Problem comes when the same IP is announced by different nodes, since that could be because of a flapping NIC. So the "tricky" part of this question, is to have a way to alert only when the same IP is announced by more than one node, x amount of times in a given time. – carrotcakeslayer Mar 02 '23 at 18:34