2

I am facing issue with elastalert rule for CPU usage (not load average). I am not getting any hit and match. Below is my .yaml file for CPU rule:

name: CPU usgae
type: metric_aggregation
index: metricbeat-*
buffer_time:
  minutes: 10
metric_agg_key: system.cpu.total.pct
metric_agg_type: avg
query_key: beat.hostname
doc_type: doc
bucket_interval:
  minutes: 5
sync_bucket_interval: true
max_threshold: 60.0
filter:
- term:
    metricset.name: cpu
alert:
- "email"
email:
- "xyz@xy.com"

Can you please help me what changes i need to make in my rule.

Any assistance will be appreciated.

Thanks.

Tekchand Dagar
  • 317
  • 1
  • 7
  • 18
  • For anyone coming here in 2020 and beyond change the ```metric_agg_key: system.cpu.total.norm.pct apart from the percentage as mentioned in asnwer below – Shashikant Soni Sep 25 '20 at 05:27

3 Answers3

3

Metricbeat reports CPU values in the range of 0 to 1. So a threshold of 60 will never be matched.

Try it with max_threshold: 0.6 and it probably will work.

Faulander
  • 327
  • 3
  • 12
  • I have made the changes as suggested by you. But still i am not getting any hit and alerts. My CPU usage is `100%` so it should get hit. I have checked on Kibana dashboard system.cpu.total.pct also showing 100%. – Tekchand Dagar Oct 17 '18 at 07:52
  • what does the original json look like? – Faulander Oct 17 '18 at 15:30
0

Try reducing buffer_time and bucket_interval for testing

Debashish Sen
  • 696
  • 5
  • 12
0

The best way to debug elastalert issue is by using command line option --es_debug_trace like this (--es_debug_trace /tmp/output.txt). It shows exact curl api call to elasticsearch being used by elastalert in background. Then the query can be copied and used in Kibana's Dev Tools for easy analysis and fiddling.

Most likely, doc_type: doc setting might have caused the ES endpoint to look like this: metricbeat-*/doc/_search You might not have that doc document hence no match. Please remove doc_type and try.

Also please note that the pct value is less than 1 hence for you case: max_threshold: 0.6 For me following works, for your reference:

name: CPU usage

type: metric_aggregation

use_strftime_index: true
index: metricbeat-system.cpu-%Y.%m.%d

buffer_time:
  hour: 1

metric_agg_key: system.cpu.total.pct
metric_agg_type: avg
query_key: beat.hostname

min_doc_count: 1
  
bucket_interval:
  minutes: 5

max_threshold: 0.6

filter:
- term:
    metricset.name: cpu

realert:
  hours: 2
...

sample match output:

{
'@timestamp': '2021-08-19T15:06:22Z',
 'beat.hostname': 'MY_BUSY_SERVER',
 'metric_system.cpu.total.pct_avg': 0.6155,
 'num_hits': 50,
 'num_matches': 10
}
Sachin Dangol
  • 504
  • 5
  • 13