12

enter image description hereI have trying to get exact count for an event in a Grafana visualization using Prometheus as timeseries DB. But the counter is showing incorrect records. I am getting a higher count for 2 days that I am getting 7 days, which definately points to something being wrong.

First I have used a single stats visualization with this promql query:

sum(increase(http_server_requests_seconds_count[$__range])).

P.S.

I have also tried the following : sum(increase(http_server_requests_seconds_count[1m])) . This also gives incorrect counts.

I have tried the same with graph and using the legend to show totals in table. This is also gives incorrect counts.

Please let me know what is the best way of showing counts which can be reliable over time range changes.

My json :

{
    "colorMode": "value",
    "fieldOptions": {
        "calcs": [
            "lastNotNull"
        ],
        "defaults": {
            "mappings": [],
            "thresholds": {
                "mode": "absolute",
                "steps": [{
                    "color": "green",
                    "value": null
                }]
            }
        },
        "overrides": [],
        "values": false
    },
    "graphMode": "area",
    "justifyMode": "auto",
    "orientation": "auto"
},
"pluginVersion": "6.6.1",
"targets": [{
    "expr": " sum(increase(http_server_requests_seconds_count[$__range]))",
    "hide": false,
    "instant": true,
    "refId": "A"
}],
"timeFrom": null,
"timeShift": null,
"title": "Total Number of Requests",
"type": "stat"
}
user7510999
  • 351
  • 1
  • 4
  • 12

2 Answers2

8

This works for me:

sum(increase(http_request_duration_seconds_count{ecs_cluster=~"$ecs_cluster", instance_id=~"$instance_id"}[$__range]))

Activated instant query and set calculation to last not null

enter image description here

Here is the pane JSON:

{
  "cacheTimeout": null,
  "datasource": "Prometheus",
  "description": "",
  "fieldConfig": {
    "defaults": {
      "custom": {},
      "unit": " requests",
      "decimals": 0,
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {
            "color": "blue",
            "value": null
          }
        ]
      },
      "mappings": [],
      "nullValueMode": "connected"
    },
    "overrides": []
  },
  "gridPos": {
    "h": 2,
    "w": 5,
    "x": 0,
    "y": 4
  },
  "id": 4,
  "interval": null,
  "links": [],
  "maxDataPoints": 100,
  "options": {
    "reduceOptions": {
      "values": false,
      "calcs": [
        "lastNotNull"
      ],
      "fields": ""
    },
    "orientation": "horizontal",
    "textMode": "auto",
    "colorMode": "value",
    "graphMode": "none",
    "justifyMode": "auto",
    "fieldOptions": {
      "calcs": [
        "lastNotNull"
      ]
    }
  },
  "pluginVersion": "7.1.0",
  "targets": [
    {
      "expr": "sum(increase(http_request_duration_seconds_count{ecs_cluster=~\"$ecs_cluster\", instance_id=~\"$instance_id\"}[$__range]))",
      "hide": false,
      "instant": true,
      "interval": "",
      "intervalFactor": 1,
      "legendFormat": "",
      "refId": "A"
    }
  ],
  "timeFrom": null,
  "timeShift": null,
  "title": "",
  "type": "stat"
}

enter image description here

trallnag
  • 2,041
  • 1
  • 17
  • 33
  • This is also not working. Getting different counts at different after doing refresh on the same range. Sometimes the count is quite low, sometimes,high – user7510999 Jul 28 '20 at 15:31
  • Check your json against mine – trallnag Jul 28 '20 at 15:44
  • Also is your Prometheus correctly setup? Try out your query in the Prometheus web interface. It should show something like in the picture I attached. If your graph is ever going down you know your Prometheus is the problem (or more concretely the targets Prometheus scrapes) – trallnag Jul 28 '20 at 15:47
  • Thanks for the json. The json seems fine except 'interval' and 'intervalfactor'. I have added this to the actual question. I will check the promethues graph – user7510999 Jul 28 '20 at 16:07
  • Checked the prometheus graph for that query. It is indeed going up and down . I have attached it to the question. – user7510999 Jul 28 '20 at 16:13
  • @user7510999 What kind of targets are you scraping with Prometheus? Anything running in multiple processes/instances for example a Python webapp with a pre fork server like Gunicorn? Or are you scraping through a load balancer? – trallnag Jul 28 '20 at 16:21
  • I am not so much on the devops side , but we have rest webservices which run in multiple instances in kubernates pods. They are fronted by load balancer. I think promethues is using kubernates for service discovery to find the multiple instances – user7510999 Jul 28 '20 at 16:38
  • Well that sounds all good to me. And as you have probably already guessed the use case to display the total number of requests in the selected time range is very common and the query/panel I posted definitely works. So I would probably investigate if the scraping and sd is actually working as expected. You want to see a line that only ever goes up or stays the same with the query `sum(increase(metric_name[]))` – trallnag Jul 28 '20 at 20:11
  • 1
    Thank you very much for your time and help. I have escalated this to devops and will post a solution here when they resolve it. – user7510999 Jul 29 '20 at 08:33
  • @user7510999 any follow-up? – trallnag Aug 13 '20 at 17:30
  • 1
    Investigation is ongoing. But we found that the prometheus is all fine. This issue happens on Grafana more than prometheus. The thing is this happens more when grafana locally and connect to prometheus in cloud, rather than when both prometheus and grafana are on cloud. We are still monitoring if this happens when both are in cloud. – user7510999 Aug 15 '20 at 10:49
  • @trallnag I have a newbie doubt that I can't find in the documentation, I'm creating a graph over time, but I see that what it does is to accumulate, everything that has to do with Prometheus is cumulative? That is to say, is it possible to obtain in a graph how many http requests were made exactly in that hour, but in a timeline graph? – jcarlosweb Jun 13 '21 at 22:10
  • 1
    @jcarlosweb, counters in Prometheus are cumulative. They only every go up (and are reset from time to time). Gauges are not cumulative. They are used to save the current state. For counter you have to apply rate(). I don't know of a simple way to get a graph like the one you are requesting. Maybe in Grafana set the step to 1h and the min interval as well to 1h – trallnag Jun 14 '21 at 08:37
  • Thank you very much for answering, I appreciate it very much, look I give you a clear example in this image. https://i.imgur.com/4r4lB9u.png, Where it says 2 I use this formula: `increase(myapp_request_custom_durations_histogram_seconds_count[$__range])` – jcarlosweb Jun 14 '21 at 21:33
  • Grafana suddenly returning empty results if `instant` is enabled. This was working fine few days back. If we disabled `instant` data fetched is not accurate. Any tips to debug why `instant` is not working. – manjunath kallannavar Sep 29 '22 at 06:22
0

Prometheus may return inaccurate results from increase() function because of the chosen data model - see this issue for details.

If you need accurate results, then the following options exist:

  • To use offset. Try something like the following: sum(http_server_requests_seconds_count - http_server_requests_seconds_count offset $__range). Note that this approach works only if the given metric - http_server_requests_seconds_count wasn't reset to 0 (aka counter reset) on the given time range.
  • To use increase() function from MetricsQL. It returns accurate values - see these docs for details.
valyala
  • 11,669
  • 1
  • 59
  • 62
  • The question of @user7510999 has nothing to do with the issue you linked to – trallnag Jul 28 '20 at 13:08
  • Tried the offset query. Doesnt seem to work for me. It give some records for upto 1 hour range. But for more than 1 hour ranges, it is saying no data. Also the counts returned are incorrect. – user7510999 Jul 28 '20 at 15:26
  • Thank you for your help. As in the other reply , we are investigating how prometheus is scrapping our metrics. Will recheck if this query works – user7510999 Jul 29 '20 at 08:37