Elastalert rule for CPU usage in percentage

Question

I am facing issue with elastalert rule for CPU usage (not load average). I am not getting any hit and match. Below is my .yaml file for CPU rule:

name: CPU usgae
type: metric_aggregation
index: metricbeat-*
buffer_time:
  minutes: 10
metric_agg_key: system.cpu.total.pct
metric_agg_type: avg
query_key: beat.hostname
doc_type: doc
bucket_interval:
  minutes: 5
sync_bucket_interval: true
max_threshold: 60.0
filter:
- term:
    metricset.name: cpu
alert:
- "email"
email:
- "xyz@xy.com"

Can you please help me what changes i need to make in my rule.

Any assistance will be appreciated.

Thanks.

For anyone coming here in 2020 and beyond change the ```metric_agg_key: system.cpu.total.norm.pct apart from the percentage as mentioned in asnwer below — Shashikant Soni, Sep 25 '20 at 05:27

score 3 · Answer 1 · answered Oct 16 '18 at 11:45

3

Metricbeat reports CPU values in the range of 0 to 1. So a threshold of 60 will never be matched.

Try it with max_threshold: 0.6 and it probably will work.

answered Oct 16 '18 at 11:45

Faulander

327
3
12

I have made the changes as suggested by you. But still i am not getting any hit and alerts. My CPU usage is `100%` so it should get hit. I have checked on Kibana dashboard system.cpu.total.pct also showing 100%. – Tekchand Dagar Oct 17 '18 at 07:52
what does the original json look like? – Faulander Oct 17 '18 at 15:30

score 0 · Answer 2 · answered Dec 19 '18 at 06:30

0

Try reducing buffer_time and bucket_interval for testing

answered Dec 19 '18 at 06:30

Debashish Sen

696
5
12

score 0 · Answer 3 · answered Aug 19 '21 at 15:54

The best way to debug elastalert issue is by using command line option --es_debug_trace like this (--es_debug_trace /tmp/output.txt). It shows exact curl api call to elasticsearch being used by elastalert in background. Then the query can be copied and used in Kibana's Dev Tools for easy analysis and fiddling.

Most likely, doc_type: doc setting might have caused the ES endpoint to look like this: metricbeat-*/doc/_search You might not have that doc document hence no match. Please remove doc_type and try.

Also please note that the pct value is less than 1 hence for you case: max_threshold: 0.6 For me following works, for your reference:

name: CPU usage

type: metric_aggregation

use_strftime_index: true
index: metricbeat-system.cpu-%Y.%m.%d

buffer_time:
  hour: 1

metric_agg_key: system.cpu.total.pct
metric_agg_type: avg
query_key: beat.hostname

min_doc_count: 1
  
bucket_interval:
  minutes: 5

max_threshold: 0.6

filter:
- term:
    metricset.name: cpu

realert:
  hours: 2
...

sample match output:

{
'@timestamp': '2021-08-19T15:06:22Z',
 'beat.hostname': 'MY_BUSY_SERVER',
 'metric_system.cpu.total.pct_avg': 0.6155,
 'num_hits': 50,
 'num_matches': 10
}

Elastalert rule for CPU usage in percentage

3 Answers3