17

I'm looking for information how "up" metrics is calculated by Prometheus

up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

How Prometheus calculate when

the instance is healthy

I'm using Apache Cassandra with Prometheus and from time to time "up" metrics showing "down". However Cassandra working OK.

uszychaha
  • 181
  • 1
  • 1
  • 5

2 Answers2

15

From the docs:

up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

i.e. it is a per scraper / exporter metric which means whether the exporter was available / reachable or not.

Pang
  • 9,564
  • 146
  • 81
  • 122
Elad Amit
  • 575
  • 3
  • 7
  • hmm, you copy and paste exactly what I pasted from docs. Thanks. However my question is still unanswered: how scraper calculate that instance is "up" – uszychaha Mar 17 '19 at 16:26
  • 2
    it's not a calculation, if the scrape succeeded it is up=1 if it failed (e.g. 4xx, 5xx, timeout, unreachable, unroutable) it is up=0 – Elad Amit Mar 17 '19 at 18:26
  • do you know where I can find logic of scrape responsible for "up/down" metrics? Cassandra is noSQL database, I do not understand from where prometheus get this 4xx, 5xx or timeout values. How they know what to check – uszychaha Mar 19 '19 at 20:45
  • 2
    prom scrapes what you tell it to scrape (port and targets in the discovery config) what it hits depends on the exporter you setup in cass – Elad Amit Mar 20 '19 at 11:58
15

Prometheus automatically adds up metric alongside a few other metrics (such as scrape_duration_seconds, scrape_samples_scraped, scrape_series_added, etc.) when scraping metrics from each configured scrape target - see these docs for more details. The up metric is set to 1 per each successful scrape. It is set to 0 otherwise. The up metric can be set to 0 in the following cases:

  • When scrape target was unreachable during the scrape.
  • When the target didn't return response during the configured timeout. The timeout can be configured via scrape_timeout option. By default it is set to 10 seconds. See more details about this option here.
  • When there was a network issue during the scrape, which prevented from successful scrape.
  • When the scrape target returns incorrect or incomplete response. The response must contain metrics with values in Prometheus text exposition format.

There may be other reasons for failed scrape. The last reason for failed scrape can be inspected at http://prometheus-host:9090/targets page in the error column. See, for example, http://demo.robustperception.io:9090/targets .

valyala
  • 11,669
  • 1
  • 59
  • 62