0

We have microservices and they require a set of configurations that are broadcasted to hosts by a separate system (say publisher) whenever there is an update in the configuration.

The receiving hosts are publishing the below metrics -

{
  "host": "h1",
  "configName": "c1",
  "configNameVersion": "v1",
}

There could be a delay in pushing these configs to all the hosts and hosts can be in an inconsistent state for some time. We want to capture this inconsistent state as Yes/No in grafana.

This can easily be done using SQL query: (if the distinct count of configVersion across hosts for any configName is greater than 1 then inconsistent state)

  select distinct count configNameVersion as "version_count"
  from table_name
  group by configName
  having (distinct count configNameVersion)>1

How can I represent the same in Prometheus and show it in the grafana dashboard?

Assume the publisher system doesn't publish any metrics.

Any alternative idea to solve this (with minimum criticality) or pointer to the appropriate document/example would be really nice. Feel free to comment if I can add more information :)

Bishnu
  • 383
  • 4
  • 14
  • Be specific what to you want to visualize - that there is some inconsistent state, how many inconsistent states, how many hosts has inconsistent state, list of hosts with inconsistent state,... I would say provide wireframe of final dashboard with the panels, their dimensions and that can be a base which will define data structure. – Jan Garaj May 26 '22 at 05:41
  • How do your treat edge cases, e. g. when you don't know state of the host? Time dimension is missing in your topic - querying of whole DB just to get distinct counts is not very effective and it wil be slower and slower over time – Jan Garaj May 26 '22 at 05:45
  • The queries would be timebound i.e. it will consider only the last 1-2 min of data and it should not be slow. As I have already mentioned, I just want to capture the consistency state as Yes/No. (No other info, no worries about how many are inconsistent, list of hosts etc). thanks – Bishnu May 26 '22 at 06:40

1 Answers1

0

This is an idea = it may not work, so you may still need to work on it to improve it.

Save it to the Prometheus with the structure:

metric name: config_name_version
labels: host=h1, config_name=c1 
value: 1 (integer only not a string v1)
time: timestamp

Use math - population standard deviation = Prometheus aggregation operator stddev.

If version values are the same, then std dev is 0 (e.g. stddev(100,100,100,100) = 0, if single one value is different then it won't be 0 (e.g. stddev(101,100,100,100) = 0.433. Of course you need to write in PromQL with grouping per config_name, e.g.:

stddev by (config_name) (config_name_version{})

Grafana will add configured/dashboard time condition.

You can translate numeric values to YES/NO strings on the Grafana level (feature "value mapping"). You have also host label, so you can add more filters (e.g. dashboard variables for host, config name selection) to the dashboard to have more user friendly dashboard, to show hosts with old versions, to visualize updates over time, ...

Jan Garaj
  • 25,598
  • 3
  • 38
  • 59
  • yes with an integer value for the version we can solve the problem with stddev fn., but in my case, it's a string. – Bishnu May 27 '22 at 06:37