Set up watcher for alerting high CPU usage by some process

Question

I'm trying to create a Watcher Alert that will be triggered when some process on a node uses over 0.95% of CPU for the last one hour.

Here is an example of my config:

{
  "trigger": {
    "schedule": {
      "interval": "10m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "metricbeat*"
        ],
        "types": [],
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "must": [
                {
                  "range": {
                    "system.process.cpu.total.norm.pct": {
                      "gte": 0.95
                    }
                  }
                },
                {
                  "range": {
                    "system.process.cpu.start_time": {
                      "gte": "now-1h"
                    }
                  }
                },
                {
                  "match": {
                    "environment": "test"
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "send-to-slack": {
      "throttle_period_in_millis": 1800000,
      "webhook": {
        "scheme": "https",
        "host": "hooks.slack.com",
        "port": 443,
        "method": "post",
        "path": "{{ctx.metadata.onovozhylov-test}}",
        "params": {},
        "headers": {
          "Content-Type": "application/json"
        },
        "body": "{ \"text\": \" ==========\nTest parameters:\n\tthrottle_period_in_millis: 60000\n\tInterval: 1m\n\tcpu.total.norm.pct: 0.5\n\tcpu.start_time: now-1m\n\nThe watcher:*{{ctx.watch_id}}* in env:*{{ctx.metadata.env}}* found that the process *{{ctx.system.process.name}}* has been utilizing CPU over 95% for the past 1 hr on node:\n{{#ctx.payload.nodes}}\t{{.}}\n\n{{/ctx.payload.nodes}}\n\nThe runbook entry is here: *{{ctx.metadata.runbook}}* \"}"
      }
    }
  },
  "metadata": {
    "onovozhylov-test": "/services/T0U0CFMT4/BBK1A2AAH/MlHAF2QuPjGZV95dvO11111111",
    "env": "{{ grains.get('environment') }}",
    "runbook": "http://mytest.com"
  }
}

This Watcher doesn't work when I set the metric system.process.cpu.start_time. Perhaps this metric is not a correct one... Unfortunately, I don't have relevant experience with Watcher to solve this issue on my own.

And another issue is that I don't know how to add the system.process.name into a message body.

Thanks in advance for any help!

score 1 · Answer 1 · answered Oct 24 '18 at 00:20

Use timestamp field instead of system.process.cpu.start_time to check for all metrcibeat-* documents in the last 10 mins

"range": { 
    "timestamp": {
        "gte": "now-10m",
        "lte": "now"
    }
}

To include system.process.name in your message body look at the {{ctx.payload}} and use the appropriate notation to refer to the process name. For ex. in one of our watcher configs we use {{_source.appname}} to refer to the application name.

Set up watcher for alerting high CPU usage by some process

1 Answers1