monitoring multiple services for performance and health

Question

Here are our requirements.

measure close-to real time average web page latency (which are hosted on multiple instances of AWS ecs) We want our service to serve a page, say.. less than a second

error status other than http 200 doesn't spike up we want to know if there's a problem.

separate services like elasticsearch is not down

we are logging some critical errors (such as purchase failing) in sentry or elasticsearch and want to know if it doesn't spike up

it's nice to have a single monitoring ui, and have an alarm when certain conditions are met.

I don't know if we need to build a service ourselves, I'm hoping we can use some ready-made service.

Where should we collect data ?
I've been looking at

elasticsearch, kibana (lacking alarm)
statsd (seems like we need separate front for visualization)
netdata (looks more like system monitoring tool than data aggregating tool)
munin, nagios (not sure if these are what we need)

score 0 · Answer 1 · answered Nov 09 '19 at 13:45

It seems like DataDog could be a good solution for you. You can use it to monitor Elasticsearch, and it has an APM product you can integrate into your app to monitor its performance. If you monitor your app with Honeybadger, you can send metrics on those errors to DataDog, too.

score 0 · Answer 2 · answered Nov 13 '19 at 04:12

Zabbix can be well-handled most of them.

with "web scenario", the web page latency can be measurable via "web.test.in", "web.test.time"...
also "web.test.rspcode" will show you out the HTTP response code then trigger an alert where needed (example: 200, 400, 401, 404, 500, 503...)
you can easily monitor the elasticsearch using official template with extra zabbix_agentd settings
that might requires another dedicate services: Sentry, EFK stack, elastalert to archive the goal
Zabbix has its own centralize UI (with proxy supports), and any kind of alert (showed on dashboard, email, Slack, SMS, PagerDuty...)

monitoring multiple services for performance and health

2 Answers2