0

On stackdriver, creating an Uptime Check gives you access to the Uptime Dashboard that contains the uptime % of your service:

Uptime on Uptime Dashboard

My problem is that uptime checks are restricted to http/tcp checks. I have other services running and those services report their health in different ways (say, for example, by a specific process running). I have incident policies already set up for this services, so if the service is not running I get notified.

Now I want to be able to look back and know how long the service was down for the last hour. Is there a way to do that?

robertokl
  • 1,869
  • 2
  • 18
  • 28
  • Have you tried with duration field on the incident page for the service? You can certainly check the duration of the incident pertaining to your service outage. It reflects for how long the service was down. – D Saini Jan 19 '18 at 01:25
  • @DSaini Yes, that would work, but I couldn't find a way to access it from an API so I could calculate the actual uptime. Other than manually summing the time of all incidents for each machine, do you see any other use I could do to achieve what I want? – robertokl Jan 19 '18 at 13:43
  • You can create, list, edit and delete uptime checks configuration through [stackdriver API](https://cloud.google.com/monitoring/uptime-checks/management#monitoring-uptime-check-create-api). You may want to review available [stackdriver monitoring API](https://cloud.google.com/monitoring/api/ref_v3/rest/) but not sure if these include exact uptime/downtime duration for monitored resources, however, it is still worthy to try them if not tried before. – D Saini Jan 19 '18 at 23:41

1 Answers1

0

There's no way to programmatically retrieve alerts at the moment, unfortunately. Many resource types expose uptime as a metric, though (e.g., instance/uptime on GCE instances) - could you pull those and do the math on them? Without knowing what resource types you're using, it's hard to give specific suggestions.

Aaron Sher, Stackdriver engineer

Aaron Sher
  • 86
  • 1