1

I am new in Prometheus and alerting, and I couldn't fine my answer by looking at the documentation.

I have some data that's coming to an elasticsearch cluster. Every day, the process creates a new index on elasticsearch and writes the data of that day to this index (e.g., my_index-2019-10-06, my_index-2019-10-05, ...). I want to monitor the size of the index of today and see that it's growing, and if it's not growing in a defined interval (15 min for example), I want to fire an alert in Prometheus. To do so, I was thinking about such an expr in alert rule:

expr: delta(elasticsearch_index_primary_store_size{index_name="my_index-TODAY-DATE"}[15m] <= 0)

The TODAY-DATE should be dynamic, and generated every day. But as far as I understand you cannot have a dynamic value in the label values, and neither a function to get the date. Then I was thinking about to compare the delta of sum of the size of all the indices start with my_index, but the problem with this approach is the retention time, and if an index is deleted, the delta of the sum may be negative, while new data is coming to the today index. Do you have any solution for this problem?

Thanks in advance.

Nooshin
  • 943
  • 1
  • 9
  • 24

2 Answers2

2

The problem comes from your assumption that you would be alerting based on the delta() of a sum() of timeseries, which is one of the first things the Prometheus documentation warns against. (And which, before subqueries were introduced, was impossible to do with a single query; you needed to set up recording rules to achieve that.)

If instead you're using a sum() of delta() values (and your exporter doesn't produce a zero or rapidly decreasing index size metric during deletion) you're all set. When an index is deleted, its delta will just silently disappear from the results produced by delta() and not affect the resulting sum in any way. Previous days' indexes will probably not change size and thus also not affect the sum. And in case there's e.g. compaction going on, causing index sizes to drop suddenly, you can just filter out those values:

expr: sum(delta(elasticsearch_index_primary_store_size{index_name=~"my_index-.*"}[15m]) > 0)) <= 0

That being said, you could generate a label with today's date as value using count_values without() ("year", year(vector(time()))) (and month() and day_of_month()) plus label_join() / label_replace() but you probably don't want to go there.

Alin Sînpălean
  • 8,774
  • 1
  • 25
  • 29
  • Thanks for the answer. I think it makes sense. I think the `expr` will be sth like this: `expr: sum(delta(elasticsearch_index_primary_store_size{index_name=~"my_index-*"}[15m]) <= 0)` If you think it makes sense, can you please modify the expr in your answer, so I can mark it as accepted? – Nooshin Nov 07 '19 at 09:18
  • Pretty close, but the condition needs to be `> 0` (or `>= 0`; or some other threshold value). You want to filter for the indices that are increasing in size, not filter them out. Edited. – Alin Sînpălean Nov 07 '19 at 13:40
  • Since I want the alert to fire when the index is not growing I need to set the condition to `<=0`. But besides that, I think I made a mistake, the condition should be outside of the `sum` function. like this: `sum(delta(elasticsearch_index_primary_store_size{index_name=~"^my_index-.*"}[15m])) <= 0` – Nooshin Nov 08 '19 at 08:43
  • Edited yet again. The condition for your alert needs to be `<= 0` indeed. The `>= 0` condition I had added was for the rate of growth calculation: you may want to filter out any indexes that are decreasing in size (e.g. because they are being deleted or compacted) in order for them to not affect your calculation. – Alin Sînpălean Nov 08 '19 at 12:12
1

Elasticsearch aliases can be used to avoid the problem of specifying the (dynamic) index name per day, see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

That is, you can use an alias to your current today's index name. For example, my_favourite_today_index could point to my_favourite_index_2019-11-07 and be updated everyday (cronjob or other method). This approach will allow you to specify a predefined index name in prometheus.

Zouzias
  • 2,330
  • 1
  • 22
  • 32